Python 杂录

第二篇杂录 侧重最佳实践.

最近 (2021/10/28) 发现官方文档有 Programming FAQ — Python 3.10.0 documentation, 很有用.

Misc

一些语法糖

2023/1/10

(1,) + (2, 3)
# (1, 2, 3)

a = {1: 1}
b = {2: 2}
{**a, **b}
# {1: 1, 2: 2}

nonlocal

2022/7/5

忘了看哪个源码的时候读到的

Python の nonlocal と global の違い - Qiita

functools

lru_cache 以及 singledispatch

__qualname__

2022/5/29

Pickle is insecure

2022/4/11, 12/17

很早就知道 pickle 不安全, 但是没有见过实际例子, 下文就是例子.

上下文管理器

2021/7/16

简单参考 完全理解 Python 关键字 “with” 与上下文管理器.

看个例子, 来自 TensorFlow 的 GradientTape 类.

import tensorflow as tf

x = tf.Variable(3.0)

with tf.GradientTape() as tape:
    y = x**2

dy_dx = tape.gradient(y, x)

如名字所示, tape 表示它像一个磁带, 记录前馈操作, 之后取出磁带反向传播. 源码 中实现的 __enter____exit__ 方法正是如此.

Codetags

2020/10/23

Programmers widely use ad-hoc code comment markup conventions to serve as reminders of sections of code that need closer inspection or review. Examples of markup include FIXME, TODO, XXX, BUG, but there many more in wide use in existing software.

参考 PEP 350 – Codetags | Python.org

‘import module’ vs. ‘from module import function’

2020/9/22

Importing the module doesn’t waste anything; the module is always fully imported (into the sys.modules mapping), so wether you use import sys or from sys import argv makes no odds.

In a large module, I’d certainly use import sys; code documentation matters, and using sys.argv somewhere in a large module makes it much clearer what you are referring to than just argv ever would.

参考 python - ‘import module’ vs. ‘from module import function’ - Software Engineering Stack Exchange

@property

2020/8/31

If you want private attributes and methods you can implement the class using setters, getters methods otherwise you will implement using the normal way.

参考

@classmethod and @staticmethod

2020/8/28

作用是在类实例化前提供 method 用以交互. 在下面的参考链接中给出的用例是用 staticmethod 检验输入, 用 classmethod 对不同类型的输入进行初始化.

参考 Python’s @classmethod and @staticmethod Explained

字典和列表

2020/7/17

字典: 哈希表, 开放寻址. 3.6 开始有新的改变.

列表: 动态数组

参考

多线程和多进程

2020/7/15

You can use threading if your program is network or IO bound, and multiprocessing if it’s CPU bound.

Without multiprocessing, Python programs have trouble maxing out your system’s specs because of the GIL (Global Interpreter Lock). Python wasn’t designed considering that personal computers might have more than one core, so the GIL is necessary because Python is not thread-safe and there is a globally enforced lock when accessing a Python object. Though not perfect, it’s a pretty effective mechanism for memory management.

Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. The multiprocessing library gives each process its own Python interpreter and each their own GIL. Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don’t share memory, they can’t modify the same memory concurrently.

参考

+= and extend

2020/6/9

>>> x = y = [1, 2, 3, 4]
>>> x += [4]
>>> x
[1, 2, 3, 4, 4]
>>> y
[1, 2, 3, 4, 4]
>>> x = y = [1, 2, 3, 4]
>>> x = x + [4]
>>> x
[1, 2, 3, 4, 4]
>>> y
[1, 2, 3, 4]

其中 += 调用了可变对象的 __iadd__ method, 原地操作, 对不可变对象来说依然是 __add__, 而 + 则是 __add__, 创建了新对象.

对于 list 而言, += 几乎等价于 extend, 只是后者是一次函数调用.

一个 corner case (2020/9/9)

>>> t = (0, [1, 2])
>>> t[1] += [3]
'''
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    t[1] += [3]
TypeError: 'tuple' object does not support item assignment
'''
>>> t
(0, [1, 2, 3])
>>> dis.dis('t[1] += [3]')
'''
  1           0 LOAD_NAME                0 (t)
              2 LOAD_CONST               0 (1)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_CONST               1 (3)
             10 BUILD_LIST               1
             12 INPLACE_ADD
             14 ROT_THREE
             16 STORE_SUBSCR
             18 LOAD_CONST               2 (None)
             20 RETURN_VALUE
'''

关键在于这并不是一个原子操作, 先对列表原地做完扩充后, 还有一个赋值动作 STORE_SUBSCR, 此处报错. 如果换成 extend 就没有这个赋值动作, 不会报错.

参考

UnboundLocalError

a, b = 0, 1

def f(n):
    for _ in range(n):
        a, b = b, a + b
        return a

print(f(7))
# UnboundLocalError: local variable 'b' referenced before assignment

当函数中有赋值操作时, 那个变量就视为局部变量. 解决方法是用 global.

参考 python - Don’t understand why UnboundLocalError occurs (closure) - Stack Overflow

当参数默认值为空列表

def f(*args, a=[]):
    a += args
    return a

x = f(1)
y = f(2)
print(x, y)
# [1, 2] [1, 2]
def g(*args, a=None):
    if not a:
        a = []
    a += args
    return a

x = g(1)
y = g(2)
print(x, y)
# [1] [2]

原因在于函数是一等公民, 参数就像是它的 member data, 随着函数调用而改变.

参考 python - “Least Astonishment” and the Mutable Default Argument - Stack Overflow

Late Binding

a = []
for i in range(3):
    def func(x): return x * i
    a.append(func)
for f in a:
    print(f(2))
'''
4
4
4
'''

for f in [lambda x: x*i for i in range(3)]:
    print(f(2))
'''
4
4
4
'''

Python is actually behaving as defined. Three separate functions are created, but they each have the closure of the environment they’re defined in - in this case, the global environment (or the outer function’s environment if the loop is placed inside another function). This is exactly the problem, though - in this environment, i is mutated, and the closures all refer to the same i.

a = []
for i in range(3):
    def funcC(j):
        def func(x): return x * j
        return func
    a.append(funcC(i))
for f in a:
    print(f(2))

for f in [lambda x, i=i: x*i for i in range(3)]:
    print(f(2))

for f in [lambda x, j=i: x*j for i in range(3)]:
    print(f(2))

# lazy evaluation
for f in (lambda x: x*i for i in range(3)):
    print(f(2))

参考

小整数

>>> a = 256
>>> b = 256
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False    

Python 储存了 -5~256 的整数, 当在这个范围内创建整数时, 都会得到先前存在的对象的引用.

参考 python - “is” operator behaves unexpectedly with integers - Stack Overflow

super

2020/5/30

当子类的 method 和父类同名时, 可以直接显式地调用父类的 method, 但更好的是用 super 来调用, 最常见的就是 __init__.

事实上上一句话并不对, super 的调用是根据 MRO (Method Resolution Order) 进行的, 并非调用它的父类, 在涉及多重继承时会有区别.

class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second():
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First, Second):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

Third()

'''
Third(): entering
First(): entering
Second(): entering
Second(): exiting
First(): exiting
Third(): exiting
'''

First 和 Second 没有父子关系, 但是在定义 class Third(First, Second) 时, MRO 是 [Third, First, Second], 于是 First 的 super 会调用 Second 的 method.

class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

class Fourth(Second, Third):
    def __init__(self):
        print("Fourth(): entering")
        super().__init__()
        print("Fourth(): exiting")

Fourth()

'''
Fourth(): entering
Second(): entering
Third(): entering
First(): entering
First(): exiting
Third(): exiting
Second(): exiting
Fourth(): exiting
'''

MRO 为 [Fourth, Second, Third, First], 规则是子类必须出现在父类之前.

class First():
    def __init__(self):
        print("First(): entering")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        # difference
        First.__init__(self)

class Third(First):
    def __init__(self):
        print("Third(): entering")
        super().__init__()

class Fourth(First):
    def __init__(self):
        print("Fourth(): entering")
        super().__init__()

class A(Second, Fourth):
    def __init__(self):
        print("A(): entering")
        super().__init__()

class B(Third, Fourth):
    def __init__(self):
        print("B(): entering")
        super().__init__()

A()
B()

'''
A(): entering
Second(): entering
First(): entering
B(): entering
Third(): entering
Fourth(): entering
First(): entering
'''

Second 显式地调用父类方法, 而 Third 通过 super 调用 MRO 下一个类的方法.

class First():
    def __init__(self):
        print("First(): entering")
        super().__init__()
        print("First(): exiting")

class Second(First):
    def __init__(self):
        print("Second(): entering")
        super().__init__()
        print("Second(): exiting")

class Third(First, Second):
    def __init__(self):
        print("Third(): entering")
        super().__init__()
        print("Third(): exiting")

Third()
'''
TypeError: Cannot create a consistent method resolution
order (MRO) for bases First, Second
'''

这里 Second 是 First 的子类, 而 Third(First, Second) 却想让 MRO 为 [Third, First, Second], 产生矛盾, 抛出错误.

参考

装饰器

关键在于函数是 Python 的一等公民, 它可以作为参数被传递, 被 return, 被赋值到一个变量.

当函数嵌套时, 内层函数可以使用外层函数的临时变量.

闭包

闭包 (closure): 嵌套函数内层函数用了外层函数的变量, 并且外层函数 return 了内层函数. 见下例.

def print_msg(msg):
    '''outer enclosing function'''

    def printer():
        '''nested function'''
        print(msg)

    return printer

another = print_msg("Hello")
another()
# Hello

'''
This technique by which some data ("Hello") gets attached 
to the code is called closure in Python.

This value in the enclosing scope is remembered 
even when the variable goes out of scope 
or the function itself is removed from the current namespace.
'''

用数学类比, 记 print_msg 为 $f$, 参数 msg 为 $\theta$, 内层 printer 为 $g$, 则 print_msg 可以理解为

\[f\colon \theta \mapsto g_\theta(\cdot).\]

装饰器

# baisc example
def uppercase_decorator(function):
    def wrapper():
        func = function()
        make_uppercase = func.upper()
        return make_uppercase
    return wrapper

def say_hi():
    return 'hello there'

decorate = uppercase_decorator(say_hi)
decorate()
# 'HELLO THERE'

# general example
def some_decorator(function):
    def wrapper(*args, **kwargs):
        print('The positional arguments are', args)
        print('The keyword arguments are', kwargs)
        function(*args, **kwargs)
    return wrapper

@some_decorator
def printer(a, b, c):
    print(a, b, c)

printer(1, 2, c=3)

'''
The positional arguments are (1, 2)
The keyword arguments are {'c': 3}
1 2 3
'''

类似地, 记 some_decorator 为 $f$, function 为 $h$, wrapper 为 $g$, 则

\[f \colon h\mapsto g_h(\cdot).\]

在例子中 $h$ 为 printer, $x$ 为 (a, b, c). 装饰器 $f$ 将原本的 $h(x)$ 变成了 $(f(h))(x) = g_h(x)$.

# further example
def n_times(n):
    def some_decorator(function):
        def wrapper(*args, **kwargs):
            for _ in range(n):
                function(*args, **kwargs)
        return wrapper
    return some_decorator

@n_times(2)
def printer(a, b, c):
    print(a, b, c)

'''
1 2 3
1 2 3
'''

'''
若不用装饰器则等价于 n_times(2)(printer)(1, 2, c=3)
注意写成 n_times(2)(printer(1, 2, c=3)) 是错误的,
因为 n_times(2) 是记录了 n(=2) 的 some_decorator,
而 printer(1, 2, c=3) 返回的是 None,
传入 some_decorator 之后什么都不会发生,
除了之前调用 printer(1, 2, c=3) 时打印一次 123.
'''

参考

垃圾回收

Garbage collection 的主要机制是 reference counts, 引用数归零则回收. 这个机制无法被关闭.

import sys
a = 'my-string'
b = [a]
print(sys.getrefcount(a))
# 4

这里 4 来自:

  • 创建 a
  • b
  • sys.getrefcount
  • print

循环引用

class MyClass():
    pass
a = MyClass()
a.obj = a
del a

删除了实例后, Python 无法再访问它, 但是其实例依然在内存. 因为它有一个指向自己的引用, 所以引用数不是零.

这类问题叫做 reference cycle, 需要 generational garbage collector 来解决, 在标准库中的 gc 模块中, 它可以检测循环引用.

分代回收

垃圾回收器追踪内存中的所有对象, 一共分为 3 代, 新对象从第 1 代开始. 如果触发了垃圾回收之后对象存活 (没有被回收), 则移动到下一代. 有三个阈值来决定何时触发垃圾回收, 当那个代的对象数量超过了对应的阈值则触发.

但总得来说平时不太需要关心垃圾回收的问题.

参考

Docstring Formats

See coding style - What is the standard Python docstring format? - Stack Overflow