python字符串格式化之学习笔记

浏览数：18 / 时间：2015年06月08日

在python中格式化输出字符串使用的是%运算符，通用的形式为

•格式标记字符串 % 要输出的值组
其中，左边部分的”格式标记字符串“可以完全和c中的一致。右边的‘值组‘如果有两个及以上的值则需要用小括号括起来，中间用短号隔开。重点来看左边的部分。左边部分的最简单形式为：

•%cdoe
其中的code有多种，不过由于在python中，所有东西都可以转换成string类型，因此，如果没有什么特殊需求完全可以全部使用’%s‘来标记。比如：

•‘%s %s %s‘ % (1, 2.3, [‘one‘, ‘two‘, ‘three‘])
它的输出为‘1 2.3 [‘one‘, ‘two‘, ‘three‘]‘，就是按照%左边的标记输出的。虽然第一个和第二值不是string类型，一样没有问题。在这个过程中，当电脑发现第一个值不是%s时，会先调用整型数的函数，把第一个值也就是1转成string类型，然后再调用str()函数来输出。前面说过还有一个repr()函数，如果要用这个函数，可以用%r来标记。除了%s外，还有很多类似的code:

字符串格式化：

代码如下复制代码
format = “hello %s, %s enough for ya?”
values = (‘world’,‘hot’)
print format % values
结果：hello world, hot enough for ya?

注：2.7可以。3.0不行

3.0要用print(format % values) 要用括号括起来。

与php类似但函数或方法名不一样的地方：

explode/" target="_blank">php explode=> python split
php trim => python strip
php implode => python join

工作中格式化字符串时遇到了UnicodeDecodeError的异常，所以研究下字符串格式化的相关知识和大家分享。

代码如下复制代码
C:Userszhuangyan>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = ‘你好世界‘
>>> print ‘Say this: %s‘ % a
Say this: 你好世界
>>> print ‘Say this: %s and say that: %s‘ % (a, ‘hello world‘)
Say this: 你好世界 and say that: hello world
>>> print ‘Say this: %s and say that: %s‘ % (a, u‘hello world‘)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0xc4 in position 10: ordinal
not in range(128)

看到print ‘Say this: %s and say that: %s‘ % (a, u‘hello world‘) 这句报的UnicodeDecodeError错误了吗，和上句的区别只是把‘hello world‘改成 u‘hello world‘的原因，str对象变成了unicode对象。但问题是，’hello world’只是单纯的英文字符串，不包含任何ASCII之外的字符，怎么会无法decode呢(www.111cn.net)？再仔细看看异常附带的message，里面提到了0xe4，这个显然不是’hello world‘里面的，所以只能怀疑那句中文了。

>>> a ‘xc4xe3xbaxc3xcaxc0xbdxe7‘

把它的字节序列打印了出来，果然就是它，第一个就是0xe4。

看来在字符串格式化的时候Python试图将a decode成unicode对象，并且decode时用的还是默认的ASCII编码而非实际的UTF-8编码。那这又是怎么回事呢？？下面继续我们的试验：

代码如下复制代码
>>> ‘Say this: %s‘ % ‘hello‘
‘Say this: hello‘
>>> ‘Say this: %s‘ % u‘hello‘
u‘Say this: hello‘
>>>

仔细看，’hello’是普通的字符串，结果也是字符串（str对象），u’hello’变成了unicode对象，格式化的结果也变成unicode了（注意结果开头的那个u）。

看看Python文档怎么说的：

If format is a Unicode object, or if any of the objects being converted using the %s conversion are Unicode objects, the result will also be a Unicode object.

如果代码里混合着str和unicode，这种问题很容易出现。在同事的代码里，中文字符串是用户输入的，经过了正确的编码处理，是以UTF-8编码的str对象；但那个惹事的unicode对象，虽然其内容都是ASCII码，但其来源是sqlite3数据库查询的结果，而sqlite的API返回的字符串都是unicode对象，所以导致了这么怪异的结果。

最后我测试用format格式字符串的方式不会出现上述异常！

代码如下复制代码
>>> print ‘Say this:{0} and say that:{1}‘.format(a,u‘hello world‘)
Say this:你好世界 and say that:hello world

接下来我们研究下format的基本用法。

代码如下复制代码
>>> ‘{0}, {1}, {2}‘.format(‘a‘, ‘b‘, ‘c‘)
‘a, b, c‘
>>> ‘{2}, {1}, {0}‘.format(‘a‘, ‘b‘, ‘c‘)
‘c, b, a‘
>>> ‘{2}, {1}, {0}‘.format(*‘abc‘) # unpacking argument sequence
‘c, b, a‘
>>> ‘{0}{1}{0}‘.format(‘abra‘, ‘cad‘) # arguments‘ indices can be repeated
‘abracadabra‘
>>> ‘Coordinates: {latitude}, {longitude}‘.format(latitude=‘37.24N‘, longitude=‘-115.81W‘)
‘Coordinates: 37.24N, -115.81W‘
>>> coord = {‘latitude‘: ‘37.24N‘, ‘longitude‘: ‘-115.81W‘}
>>> ‘Coordinates: {latitude}, {longitude}‘.format(**coord)
‘Coordinates: 37.24N, -115.81W‘
>>> coord = (3, 5)
>>> ‘X: {0[0]}; Y: {0[1]}‘.format(coord)
‘X: 3; Y: 5‘