初めてのPythonを読んでみる(3)

飲み会とか色々あり、ちょっと間が空いてしまった。本日は5章のみ。

5章文字列
- - 文字列はシーケンスの一種。

- 5.1
  - Windows のディレクトリパスのような \ を多く含む文字を扱うときは raw 文字列が便利
  - r'foobar' のように r をつけると raw 文字列
  - raw は「生の」「未加工の」といった意味

>>> 'C:\Documents and Settings\takanori\My Documents'
'C:\\Documents and Settings\takanori\\My Documents'
>>> print 'C:\Documents and Settings\takanori\My Documents'
C:\Documents and Settings       akanori\My Documents
>>> r'C:\Documents and Settings\takanori\My Documents'
'C:\\Documents and Settings\\takanori\\My Documents'
>>> print r'C:\Documents and Settings\takanori\My Documents'
C:\Documents and Settings\takanori\My Documents
>>> path = r'C:\Documents and Settings\takanori\My Documents'
>>> path
'C:\\Documents and Settings\\takanori\\My Documents'
>>> print path
C:\Documents and Settings\takanori\My Documents
>>> path[2]
'\\'
>>> print path[2]
\

- - トリプルクォーテーションを使うと複数行文字列が書ける
  - ヒアドキュメント？
  - raw 文字列でも OK っぽい

>>> stmt = """hoge
... aaa \t\n
... bbb \'
... ccc
... """
>>> stmt
"hoge\naaa \t\n\nbbb '\nccc\n"
>>> print stmt
hoge
aaa     

bbb '
ccc

>>> stmt = r"""hoge
... aaa \t bbb \n
... ccc
... """
>>> stmt
'hoge\naaa \\t bbb \\n\nccc\n'
>>> print stmt
hoge
aaa \t bbb \n
ccc

- - u'foobar' のように u を頭につけると Unicode 文字列になる
  - Unicode 文字列と通常文字列の連結は Unicode 文字列になる
  - じゃあ raw 文字列は？

>>> 'normal' + u'unicode'
u'normalunicode'
>>> r'raw' + u'unicode'
u'rawunicode'
>>> 'normal' + r'raw'
'normalraw'

- - raw 文字列は結局は通常の文字列になるので、特別扱いというわけではないようだ
  - 相互変換は str, unicode で
  - u と r を組み合わせることも可能

>>> u'foobar'
u'foobar'
>>> str(u'foobar')
'foobar'
>>> unicode(str(u'foobar'))
u'foobar'
>>>
>>> path = ur'C:\Documents and Settings\takanori\My Documents\Access Connections'
>>> path
u'C:\\Documents and Settings\\takanori\\My Documents\\Access Connections'
>>> print path
C:\Documents and Settings\takanori\My Documents\Access Connections

- 5.3
  - 書式指定子にディクショナリを使うことも可能
  - vars() と組み合わせて使うことが多い

>>> dict = { "I" : "my", "you" : "your", "he" : "his" }
>>> print "%(I)s %(you)s %(he)s" % dict
my your his

- 5.4
  - list(string) で文字列などのシーケンスをリストにする
  - ''.join でくっつける
  - この join のイディオムは一見奇妙に映るよなぁ・・・。直感的でない気がする

>>> str = 'I read very nice post by Matt today and it has many good insights though I can\'t say I agree on all points.'
>>> str.find('today')
30
>>> x = list(str)
>>> x
['I', ' ', 'r', 'e', 'a', 'd', ' ', 'v', 'e', 'r', 'y', ' ', 'n', 'i', 'c', 'e', ' ', 'p', 'o', 's', 't', ' ', 'b', 'y', ' ', 'M', 'a', 't', 't', ' ', 't', 'o', 'd', 'a', 'y', ' ', 'a', 'n', 'd', ' ', 'i', 't', ' ', 'h', 'a', 's', ' ', 'm', 'a', 'n', 'y', ' ', 'g', 'o', 'o', 'd', ' ', 'i', 'n', 's', 'i', 'g', 'h', 't', 's', ' ', 't', 'h', 'o', 'u', 'g', 'h', ' ', 'I', ' ', 'c', 'a', 'n', "'", 't', ' ', 's', 'a', 'y', ' ', 'I', ' ', 'a', 'g', 'r', 'e', 'e', ' ', 'o', 'n', ' ', 'a', 'l', 'l', ' ', 'p', 'o', 'i', 'n', 't', 's', '.']
>>> ''.join(x)
"I read very nice post by Matt today and it has many good insights though I can't say I agree on all points."

- - string モジュールは後方互換性のために残してある。基本的には使わないほうが良い。

- 5.5
  - カテゴリーが同じなら行える操作も同じ
  - 数値 ... 算術演算 / シーケンス ... インデクシング、スライシング、連結など / 写像(map) ... キーによるインデクシング

>>> 'aaa' + 'bbb'
'aaabbb'
>>> (2,4) + (1,3)
(2, 4, 1, 3)

vars() は知らなかった。が、知ったところで使いどころはまだ見えてこない。
それにしても・・・うーむ、このペースでいくと1ヶ月では終わらんな。