🖊️Effective Python Tips

1. Pythonic Thinking Link to heading

1.1 Follow PEP8 Style Link to heading

Regarding Naming
1. Functions, variables, and attributes use lowercase letters, with words separated by underscores.
2. Protected instance attributes start with a single underscore.
3. Private instance attributes should start with two underscores.
4. Class (and exception) names should have the first letter capitalized.
5. Module-level constants are all uppercase letters, with words separated by underscores.
6. Instance methods in classes should name the first parameter self to represent the object itself.
7. The first parameter of class methods should be named cls to represent the class itself.
Regarding Expressions
1. Use inline negation (write the negation directly before the content to be negated). For example: if a is not b
2. To check if something is empty, you can directly test it (empty is False). For example: if not somelist
3. If an expression does not fit on one line, it should be enclosed in parentheses and broken into multiple lines.
4. Use parentheses to continue multi-line expressions instead of the backslash \.
Regarding Imports
1. import statements should be placed at the top.
2. Use absolute names for imports.
3. When importing, first divide into three parts: standard library, third-party modules, and your own modules.

1.2 Use f-strings for String Interpolation Link to heading

When formatting strings, you can use Python’s special f-string method.

key = "my_var"
value = 1.23
print(f"{key!r:<10}={value:.2f}")

By adding f (format) before the string, it indicates that this is an interpolated string. The content inside {} can be variables, and the part after : specifies the formatting method. Using ! converts the value to its Unicode and repr form. We can even apply nested formatting.

# Customize the number of decimal places displayed
places = 3 # Number of decimal places to display
number = 1.2345
print(f"my number is {number:.{places}f})

1.3 Use Assignment Expressions Link to heading

This kind of assignment can be embedded inside other statements (like if), and it can also be used to adjust the prominence of different statements.

# Normal writing
a = 1
if a=1:
    print("yes")
# Using assignment expression
if (a:=1)>0:
    print("yes")

Using this reduces the importance of a and makes the statement more fluent. We can also use this assignment style to implement a switch.

# switch statement
if (count:=a)>=2:
    print("a is enough")
elif (count:=b)>=3:
    print("b is enough")
else:
    print("Nothing enough")

1.4 Differences between bytes and str Link to heading

bytes contains raw data (8-bit unsigned values), while str corresponds to Unicode code points, representing textual characters in human languages. Converting bytes to str requires the decode method, while converting back uses encode. These two types are not compatible with each other. In the core part of the program, Unicode data should be used, so decoding and encoding operations can be done at the beginning of the program, which is called the Unicode Sandwich.

This difference should be especially noted when opening files using open:

with open("data.bin" ,"wb") as f:  # Open binary file as f and operate
    pass
with open("text.txt","w") as f: # Open text file
    pass
with open("data.bin","r",encoding="cp1252") as f: # Open binary file and re-encode as text
    pass

1.5 Use enumerate instead of range Link to heading

When we need to iterate over an iterator and know the current position, we can choose to use enumerate to wrap the iterator as a lazy generator, which can be manually advanced, returning a tuple of the current index (starting from 0) and the iterator’s content each time.

l=["a","b","c"]
it = enumerate(l)
num,con = next(it) # Advance one step

1.6 Use zip to iterate two iterators simultaneously Link to heading

Sometimes we need to iterate two iterators at the same time. We can use zip to wrap both into a lazy generator, then unpack the content inside. Note that zip uses the shorter iterator as its length.

a=[1,2,3]
b=["a","b","c"]
s = zip(a,b)
import itertools
l = itertools.zip_longest(a,b,fillvalue=None) # Choose the longer one as total length, missing filled with the third argument

2. Lists and Dictionaries Link to heading

2.1 Learn slicing Link to heading

In Python, classes that implement __getitem__ and __setitem__ can be sliced. The syntax is list[start:end], which means taking from start up to (but not including) end. If omitted, it means from the beginning (or to the end). Negative indices count from the end. If the index exceeds the list size, it defaults to the last element.

The sliced list is a new object; modifying it does not affect the original. Assigning to a slice does not require the same size but directly replaces the slice.

l = [1,2,3]
l1 = l[:] # Use slicing to assign a new object
l[:1]=[1,2,3] # Use slice assignment

# At this point, l1 is [1,2,3], and l is [1,2,3,2,3]

Slicing can also specify step size list[start:end:step], which indicates the stride. Using -1 can reverse a string, but generally, you should follow the principle: use positive step, omit start and end indices for readability.

2.2 Use starred unpacking Link to heading

Sometimes we need to unpack a few elements from a list, putting the rest into a starred variable forming a list. We can also unpack generators, but be careful about memory exhaustion.

Note: Generally, a tuple should not be unpacked into more than three normal variables or two normal variables and one starred variable.

l = [1,2,3,4,5]
a,b,*c=l
# Now a=1, b=2, c=[3,4,5]

2.3 Use sort for sorting Link to heading

When using the sort function to sort, we can use the key parameter to customize a sorting function.

a = ["abc","ad","b"]
a.sort(key = lambda x:len(x)) # Sort a in ascending order by string length
# When multiple criteria are needed, return a tuple for sorting
a.sort(key = lambda x:(-len(x),x)) # First sort by length descending, then by alphabetical order when lengths are equal

2.4 Use get to access dictionary contents Link to heading

When accessing dictionary contents, you can index by key directly, but sometimes keys may be missing, causing errors. We can use the get function to get dictionary contents, specifying a default value if the key is missing (simplifying the program).

d = {"a":1,"b":2}
d.get(c,0) # Returns the default value 0 if key is missing

Also, for missing keys, we can use defaultdict to provide default values.

from collections import defaultdict
d = defaultdict(list) # When the key does not exist, it is set to list type
d["a"].append(1) # 'a' is a missing key, but it automatically initializes as a list and appends 1

2.5 Use `missing` to construct dependent default values for keys Link to heading

Sometimes we not only need to set default values for a dictionary but also want to perform some operations (like opening a file and using the file handle as the value). In this case, we can construct a new dictionary type to set default values.

class FileDict(dict):
    def __missing__(self,path): # If the key already has a value (file already opened), it won't open again
         handle = open(path)
         self[path] = handle
        return handle

3. Functions Link to heading

3.1 Design parameter passing methods Link to heading

In Python, parameters can be passed in two ways: positional and keyword. By default, both are allowed, but sometimes positional parameters should be used: disallow external calls by keyword to reduce coupling to parameter names, while sometimes keyword parameters are better: force callers to specify which parameter to pass.

We can restrict this in function definitions:

def function(a,b,/,d,*,e):
    pass

Parameters to the left of / must be positional only; parameters to the right of * must be keyword only; parameters in between can be either.

3.2 Better use of positional parameters Link to heading

Sometimes, we want to pass multiple positional parameters at once, which can be done by unpacking a list.

def function1(a,b,c):
    print(f"a={a},b={b},c={c}")
l = [1,2,3]
function1(*l)

Sometimes we don’t know how many positional parameters will be passed, so we can use the variadic parameter to capture all into a tuple.

def func(a,*kwargs):
	pass
# Call function
func(1,2,3,4,5) # a=1, kwargs = (2,3,4,5)

3.3 Better use of keyword parameters Link to heading

Generally, keyword parameters have these three advantages:

Calling functions with keywords makes the call clearer.
Keyword parameters can have default values.
Functions can be flexibly extended.

Sometimes, we want to pass multiple keyword parameters at once, which can be done by unpacking a dictionary.

def function1(a,b,c):
    print(f"a={a},b={b},c={c}")
d = {"b":1,"c":2}
function1(1,**d)

Sometimes we don’t know how many keyword parameters will be passed, so we can use the variadic keyword parameter to capture all into a dictionary.

# Print dictionary contents in a certain format
def printd(**kwargs):
    for key,value in kwargs.items():
        print(f"{key}={value}")
# Call function
d = {"b":1,"c":2}
printd(**d)

Keyword parameters often have default values as optional arguments, but the default value should not be an object, because the object is created once when the function is defined, and all calls will refer to the same object. So for default objects or mutable defaults, set it to None first.

def fun(d={}):
    return d # Returns the same d every time

# So initially set to None, then create a new object
def fun(d=None):
    if d is None:
        d = {}
    return d # Now returns different d each time

def log(message,when=None): # If not specified, generate based on current time.
    if when is None:
        when = datatime.now()

3.4 Function closures Link to heading

Python supports function closures, meaning you can define a nested function inside a larger function, and the inner function can reference values from the outer function. Functions are first-class objects and can be manipulated and referenced directly. We can use closures to implement special sorting.

# Use closure to implement special sorting
def sort_by_group(values,group):
    def helper(x):
        if x in group:
            return(0,x)
        return (1,x)
    values.sort(key=helper)

The inner function prioritizes sorting elements inside the group, then sorts by value.

In this example, the inner function can read the outer function’s variables, but if we modify a variable inside the inner function, it does not affect the outer function’s variable. This is because of Python’s three-level scope: even if the variable name in the current function shadows the outer variable, assigning to it creates a new variable in the current scope to prevent scope pollution.

Three-level scope

Current function scope

Enclosing scope (functions containing the current function)

Module-level scope (global scope)

If we want to extend the variable’s scope, we can use nonlocal to extend to the enclosing scope, or global to extend to the global scope.

3.5 Function decorators Link to heading

Python can use decorators to wrap a function, running other code before and after the function executes. This can ensure users use the function correctly, help debug, or implement function registration.

import time
def trace(func):
	def wrapper(*args,**kwargs):
		stime = time.time()
		result = func(*args,**kwargs) # Run decorated function (with outer parameters)
		etime = time.time()
		print(etime-stime)
		return result
	return wrapper # Return decorated function

@trace # Decorate fun1 to add timing before and after
	def fun1():
        print("hello")

The above code is a timing decorator, wrapping a function into another function (wrapper) and returning it, adding start and end time recording code before and after the function.

However, the decorated function’s name is not the original but becomes the decorator’s returned function, and many standard attributes change. To preserve these, we can use another built-in decorator to keep important metadata.

import time
from functools import wraps

def trace(func):
    @wraps(func) # Preserve important function information
	def wrapper(*args,**kwargs):
		stime = time.time()
		result = func(*args,**kwargs) # Run decorated function
		etime = time.time()
		print(etime-stime)
		return result
	return wrapper # Return decorated function

4. Comprehensions and Generators Link to heading

4.1 Use list comprehensions Link to heading

In Python, you can derive a new list from an iterable object. This is called a list comprehension (there are also dictionary and set comprehensions).

l1 = range(10)
l2 = [x**2 for x in l1 if x%2 == 0] # Take all even numbers in l1 and square them to create a new list

Besides the basic use, list comprehensions support nested loops and multiple if conditions (AND relation), but generally should not exceed two.

mat = [[1,2,3],[2,3,4],[3,4,5]]
squar_mat = [[x**2 for x in row] for row in matrix] # Use two layers of nesting to square each y in the matrix

Sometimes we want to first test a value and then use it, where we can add assignment expressions in the if to simplify code.

d = {"A":1,"b":2,"C":2}
order = ["A"]
found = [(name,batches) for name in order if (batches := d.get(name,0))]

Sometimes, when data volume is large, to avoid high memory usage, replace square brackets with parentheses to create a generator expression, computing one result at a time.

4.2 Use generators Link to heading

When functions need to return many list-like elements, consider using generators.

# Return the position of each word's start in a sentence
def index_words(test):
    if text: # The first letter of the sentence is not empty
        yield 0
    for index,letter in enumerate(text):
        if letter == " ":
            yield index+1

When a function’s return is changed to yield, it becomes a generator. Each time next is called, it runs once, with advantages of low memory usage and clear intent. But generators cannot be reused: each generator has its own state.

To allow producing a new generator each time iteration is needed, we can create a container implementing the iterator protocol, becoming an iterable container rather than just an iterator. (Built-in lists are also iterable containers.)

# Implement an iterable container; each iteration produces a new generator object
class ReadVisits:
    def __init__(self,data_path):
        self.data_path = data_path
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

When Python runs for x in l, it calls iter(l), triggering l.__iter__, which returns an iterator object (implementing __next__). Python repeatedly calls next until data is exhausted.

Sometimes we want to use multiple generators in sequence; yield from can simplify the code.

def move(period,speed):
    for _ in range(period):
        yield speed
def pause(delay):
    for _ in range(delay):
        yield 0
        
def action():
    yield from move(4,6)
    yield from pause(3)

Also, iterators can be combined using tools from itertools, including chaining, filtering, and composing.

5. Classes and Interfaces Link to heading

5.1 Let simple interfaces accept functions Link to heading

Python has many built-in APIs allowing us to pass stateless functions with clear parameters and return values to customize behavior. Such functions are called hooks, which are called back by the API at appropriate times (like sort).

Sometimes we want to record information when the function is called (like counting calls). We can use a callable class object with __call__ as the hook function.

# Returns 0 when called and records the count
class CountMissing():
    def __init__(self):
        self.added = 0
    
    def __call__(self):
        self.added += 1
        return 0

5.2 Class calls itself Link to heading

When implementing polymorphism and other features, Python sometimes writes external functions inside the class, which need to call the class itself. We use @classmethod to implement this.

class PathInputData():
    def __init__(self, path):
        super().__init__()
        self.path = path
    
    def read(self):
        with open(self.path) as f:
            return f.read()
    
    @classmethod
    def generate_input(cls, config:dict):  # This method calls itself to generate instances of this type.
        data_dir = config.get['data_dir'] # Find data directory info from passed config
        for name in os.listdir(data_dir):
            yield cls(os.path.join(data_dir, name))

5.3 Class attribute settings Link to heading

Python class attributes have two access levels: public and private. Private attributes start with two underscores and can only be used by the class itself. Actually, Python implements private attributes by name mangling, prefixing the attribute name with _classname.

class Name:
    def __init__(self,value):
        self.__value = value
        
n = Name(1)
n._Name__value

Attributes starting with a single underscore are conventionally called protected fields. PEP8 recommends this approach for attribute protection, as it may lead to misuse by subclasses, but direct restriction is discouraged; instead, suggestions are provided in documentation. However, if the superclass is an API for external use, private attributes can be used to avoid name collisions.

5.4 Initialize superclass Link to heading

After inheritance, the superclass can generally be initialized by calling the class name and __init__.

class MyBaseClass:
    def __init__(self, value):
        self.value = value


class TimesSeven(MyBaseClass):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        self.value *= 7

But this approach can cause issues in multiple inheritance (like diamond inheritance where the base class is inherited repeatedly). So it’s better to use super which follows the Method Resolution Order (MRO).

class TimesSeven(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value *= 7

Generally, super does not require arguments but can accept two: super(Class, self).__init__(value) where the first argument specifies from which class to start MRO initialization, and the second controls the resolution order based on the class.

Usually, if we want to use multiple inheritance for convenient logic encapsulation, we should write the inherited classes as mix-in classes, which provide a small set of methods for subclasses, do not define instance-level attributes, and do not use constructors.

# Represent Python objects as dictionaries
class ToDictMixin:
    def to_dict(self):
        return self._traverse_dict(self.__dict__)
    
    def _traverse_dict(self, instance_dict):
        output = dict
        for key, value in instance_dict.items():
            output[key] = self._traverse(key, value)
        return output
    
    def _traverse(self, key, value):
        if isinstance(value, ToDictMixin):
            return value.to_dict()
        elif isinstance(value, dict):
            return self._traverse_dict(value)
        elif isinstance(value, list):
            return [self._traverse(key, i) for i in value]
        elif hasattr(value, "__dict__"):
            return self._traverse_dict(value.__dict__)
        else:
            return value

Such a class is intended to be inherited by other types and provide methods for fine adjustments, including instance and class methods.

5.5 Custom containers Link to heading

Sometimes we want our custom classes to behave like lists or tuples, supporting sequence operations. Besides implementing __getitem__ for indexing and __len__ for length, we can inherit abstract base classes from Python’s built-in collections.abc module and implement the required methods.