python——TypeError: 'str' does not support the buffer interface

浏览数：28 / 时间：2015年06月08日

import socket
import sys

port=51423
host="localhost"

data=b"x"*10485760　　　　　　　　　　　　　　　　　　　　　　#在字符串前加 b 是字符串变为bytes类。
sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((host,port))

byteswritten=0
while byteswritten<len(data):
    startpos = byteswritten
    endpos = min(byteswritten+1024,len(data))
    byteswritten+=sock.send(data[startpos:endpos])
    sys.stdout.write("wrote %d bytes\r"% byteswritten)
    sys.stdout.flush()

sock.shutdown(1)

print("All data sent.")
while True:
    buf = sock.recv(1024).decode()　　　　　　　　　　　　#.decode()函数把bytes类型转化为str类型。
    if not len(buf):
        break
    sys.stdout.write(buf)

问题解决：

In python 3, bytes strings and unicode strings are now two different types. Since sockets are not aware of string encodings, they are using raw bytes strings, that have a slightly different interface from unicode strings.

在python 3里，bytes类型和unicode字符串类型现在已经是两种不同的类型了。现在，socket的模块无法识别unicode str类型，它们现在使用bytes字节类。

So, now, whenever you have a unicode string that you need to use as a byte string, you need to encode() it.

And when you have a byte string, you need to decode it to use it as a regular(python 2.x) string.

所以，现在无论什么时候你有一个unicode str类但是你需要一个bytes类，你需要使用encode()函数转换它。

当你有一个bytes类型时，你需要用decode()函数转换它，作为一个普通unicode字符串。

Unicode strings are quotes enclosed strings. Bytes strings are b"" enclosed strings

unicode str类直接用引号“ *** ”包含就可以，bytes字符串则需要在引号“ ”前加b。b“ test string ”

When you use client_socket.send(data),replace it by client_socket.send(data.encode()).

当你使用socket.send（data）时，用socket.send(data.encode())代替。

When you get data using data = client_socket.recv(512),replace it by data =client_socket.recv(512).decode()

当你使用data=socket.recv(512)时，用socket.recv(512).decode()代替。

============================================知识扩展========================================

一、Python 3的bytes/str之别

原文：The bytes/str dichotomy in Python 3

了解了bytes/str之别，理解codecs模块就容易了。

Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分。文本总是Unicode，由str类型表示，二进制数据则由bytes类型表示。 Python 3不会以任意隐式的方式混用str和bytes，正是这使得两者的区分特别清晰。你不能拼接字符串和字节包，也无法在字节包里搜索字符串（反之亦然），也不能将字符串传入参数为字节包的函数（反之亦然）。这是件好事。

不管怎样，字符串和字节包之间的界线是必然的，下面的图解非常重要，务请牢记于心：

字符串可以编码成字节包，而字节包可以解码成字符串。

>>> ‘€20‘.encode(‘utf-8‘)
b‘\xe2\x82\xac20‘
>>> b‘\xe2\x82\xac20‘.decode(‘utf-8‘)
‘€20‘

这个问题要这么来看：字符串是文本的抽象表示。字符串由字符组成，字符则是与任何特定二进制表示无关的抽象实体。在操作字符串时，我们生活在幸福的无知之中。我们可以对字符串进行分割和分片，可以拼接和搜索字符串。我们并不关心它们内部是怎么表示的，字符串里的每个字符要用几个字节保存。只有在将字符串编码成字节包（例如，为了在信道上发送它们）或从字节包解码字符串（反向操作）时，我们才会开始关注这点。

传入encode和decode的参数是编码（或codec）。编码是一种用二进制数据表示抽象字符的方式。目前有很多种编码。上面给出的UTF-8是其中一种，下面是另一种：

>>> ‘€20‘.encode(‘iso-8859-15‘)
b‘\xa420‘
>>> b‘\xa420‘.decode(‘iso-8859-15‘)
‘€20‘

编码是这个转换过程中至关重要的一部分。离了编码，bytes对象b‘\xa420‘只是一堆比特位而已。编码赋予其含义。采用不同的编码，这堆比特位的含义就会大不同：

>>> b‘\xa420‘.decode(‘windows-1255‘)
‘?20‘

二、codecs 模块简介

codecs是encoders和decoders的缩写。

codecs模块为我们解决的字符编码的处理提供了lookup方法，它接受一个字符编码名称的参数，并返回指定字符编码对应的 codecs.CodecInfo 对象，该对象包含了 encoder、decoder、StreamReader和StreamWriter的函数对象和类对象的引用。为了简化对lookup方法的调用， codecs还提供了getencoder(encoding)、getdecoder(encoding)、getreader(encoding)和 getwriter(encoding)方法；进一步，简化对特定字符编码的StreamReader、StreamWriter和 StreamReaderWriter的访问，codecs更直接地提供了open方法，通过encoding参数传递字符编码名称，即可获得对 encoder和decoder的双向服务。

这个模块的强大之处在于它提供了流的方式来处理字符串编码，当处理的数据很多时，这种方式很有用。
你可以使用IncrementalEncoder和IncrementalDecoder，但是强烈建议使用StreamReader和StreamWriter，因为使用它们会大大简化你的代码。

例如，有一个test.txt的文件，它的编码为gbk，现在我需要将它的编码转换为utf8，可以编写如下代码：

[python] view plain copy