Python

Sample Patterns

Command-line Argument Parsing / Runnable Executable

Uses argparse
Runner function for use in pyproject.toml [project.scripts] section

pyproject.toml

[project.scripts]
main = "main:run"

main.py

class MainKwargs[TypedDict]:
  config_file: str
  output_path: str
  loglevel: str
  filter: list[str]


def main(**kwargs: Unpack[MainKwargs]):
  ...


def run() -> None:
    """
    Runner Function
    """
    parser = argparse.ArgumentParser(prog='ProgramName',
                                     description='What the program does',
                                     epilog='Text at the bottom of help')
    parser.add_argument('-c', '--config_file', required=True, help="Config file path")
    parser.add_argument('-o', '--output_path', default=f'output-{datetime.now():%Y%m%d%H%M%S}.txt'
                        help="Output directory for the generated files")
    parser.add_argument('-l', '--loglevel',
                        help='Specifies the level of verbosity for logging.',
                        choices=['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'],
                        default='INFO')
    parser.add_argument('-f', '--filter', required=False,
                        help='Filter the sources to run against.', nargs='*', default=[])
    args = parser.parse_args()

    main(config_file=args.config_file,
         output_path=args.output_path,
         source_path=args.loglevel,
         filter=args.filter)


if __name__ == '__main__':
    run()

Test for List of Elements in a List

Uses a generator expression inside the any() or all() functions.

any(x in test_list for x in list_of_values)

all(x in test_list for x in list_of_values)

Multithreaded Processor

Creates the worker threads as Thread objects.
Uses Queue objects to get data into and out of the worker threads – a work queue for the inputs and a result queue for the outputs.
Note: Boto3 Sessions are not threadsafe!

import logging
import threading
from queue import Queue

def worker(work_queue: Queue[Any], result_queue: Queue[Any]) -> None:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
    while not work_queue.empty():
        work_payload = work_queue.get()

        # all incoming data needs to be in queue object
        # do something to populate work_result
        logger.info("Info about this thread's work: %s", something)

        result_queue.put(work_result)

def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data

    work_queue = Queue()
    for payload in payloads:
        work_queue.put(payload)

    result_queue: Queue[dict] = Queue()
    threads = []
    for thread in range(0, max_threads):
        logger.info('Creating thread')
        thread = threading.Thread(target=worker, args=(work_queue, result_queue))
        threads.append(thread)
        thread.start()

    logger.info('Waiting for workers to complete.')
    for thread in threads:
        thread.join()

    results = []
    logger.info('Collecting results.')
    while not result_queue.empty():
        results.append(result_queue.get())

Multithreaded Processor using ThreadPoolExecutor

The worker function is written like a normal function with normal inputs and returning its result.
Uses concurrent.futures.ThreadPoolExecutor to handle creating the threads, getting parameters to the function, and getting results back to the caller.
Submitting a payload returns a Future object which is similar to a JavaScript Promise.
Iterate through the collection of future objects with concurrent.futures.as_completed to get the objects that are completed as they complete.
Note: Boto3 Sessions are not threadsafe!

import concurrent.futures
import logging


def worker(payload: Any, *args, **kwargs) -> Any:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')

    # use args/kwargs like a normal funciton
    # do something to populate work_result
    logger.info("Info about this thread's work: %s", something)

    return work_result


def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data
    results = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
        result_futures = []
        for payload in payloads:
            # Submit the payload to the executor and append the resulting "future" to a list.
            result_futures.append(executor.submit(worker, payload, *args, **kwargs))
            
        # Iterate through the "futures" until they're complete and append the results to a list.
        for future in concurrent.futures.as_completed(result_futures):
            results.append(future.result())

Requests Session with Larger Thread Pool

from requests import Session
from requests.adapters import HTTPAdapter

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

with Session() as session:
    adapter = HTTPAdapter(pool_connections=MAX_POOL_SIZE, pool_maxsize=MAX_POOL_SIZE)
    session.mount("https://", adapter)
    session.mount("http://", adapter)

    # ...

S3 Client with Larger Thread Pool

from boto3 import Session
from botocore.config import Config

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

client = Session().client('s3', config=Config(max_pool_connections=MAX_POOL_SIZE))

Python

Guides

Core Python Documentation

The Python Standard Library

Built-in Functions

Built-in Constants

Built-in Exceptions

Built-in Types

Special Method Names Dunder Method Cheat Sheet

boolean operations
comparisons
Boolean Type
- bool: boolean
Numeric Types
- int: integers
- float: floating point (typically a double in underlying C)
- complex: real part + imaginary part (√i)
Iterator Types
Sequence Types
- list: mutable sequences typically of homogenous data
- tuple: an immutable sequences
- range: an immutable sequence of numbers
- Text Sequence Type
  - str
- Binary Sequence Types
  - bytes: immutable sequences of single bytes
  - bytearray: mutable sequences of single bytes
  - memoryview: objects that allow Python code to access the internal data of an object that supports the buffer protocol without copying.
Set Types
- set
- frozenset
Mapping Types: currently only dict
- dict: maps hashable values to arbitrary objects
Context Manager Types

Text Processing Services

string: Common string operations
re: Regular expression operations
difflib: Helpers for computing deltas
textwrap: Text wrapping and filling
unicodedata: Unicode Character Database
stringprep: Internet String Preparation

Binary Data Services

struct: Interpret bytes as packed binary data
codecs: Codec registry and base classes

Data Types

datetime: Basic date and time types
- strftime/strptime format codes
zoneinfo: IANA time zone support
calendar: General calendar-related functions
collections: Container datatypes
collections.abc: Abstract Base Classes for Containers
heapq: Heap queue algorithm
bisect: Array bisection algorithm
array: Efficient arrays of numeric values
weakref: Weak references
types: Dynamic type creation and names for built-in types
copy: Shallow and deep copy operations
pprint: Data pretty printer
reprlib: Alternate repr() implementation
enum: Support for enumerations
graphlib: Functionality to operate with graph-like structures

Numeric and Mathematical Modules

numbers: Numeric abstract base classes
math: Mathematical functions
cmath: Mathematical functions for complex numbers
decimal: Decimal fixed-point and floating-point arithmetic
fractions: Rational numbers
random: Generate pseudo-random numbers
statistics: Mathematical statistics functions

Functional Programming Modules

itertools: Functions creating iterators for efficient looping
functools: Higher-order functions and operations on callable objects
operator: Standard operators as functions

File and Directory Access

pathlib: Object-oriented filesystem paths
- Why you should be using pathlib
- No really, pathlib is great
os.path: Common pathname manipulations
stat: Interpreting stat() results
filecmp: File and Directory Comparisons
tempfile: Generate temporary files and directories
glob: Unix style pathname pattern expansion
fnmatch: Unix filename pattern matching
linecache: Random access to text lines
shutil: High-level file operations

Data Persistence

pickle: Python object serialization
copyreg: Register pickle support functions
shelve: Python object persistence
marshal: Internal Python object serialization
sqlite3: DB-API 2.0 interface for SQLite databases

Data Compression and Archiving

zlib: Compression compatible with gzip
gzip: Support for gzip files
bz2: Support for bzip2 compression
lzma: Compression using the LZMA algorithm
zipfile: Work with ZIP archives
tarfile: Read and write tar archive files

File Formats

csv: CSV File Reading and Writing
configparser: Configuration file parser
tomllib: Parse TOML files
netrc: netrc file processing

Cryptographic Services

hashlib: Secure hashes and message digests
hmac: Keyed-Hashing for Message Authentication
secrets: Generate secure random numbers for managing secrets

Generic Operating System Services

os: Miscellaneous operating system interfaces
io: Core tools for working with streams
time: Time access and conversions
logging: Logging facility for Python
logging.config: Logging configuration
logging.handlers: Logging handlers
platform: Access to underlying platform’s identifying data
errno: Standard errno system symbols
ctypes: A foreign function library for Python

Command Line Interface Libraries

argparse: Parser for command-line options, arguments and subcommands
optparse: Parser for command line options
getpass: Portable password input
fileinput: Iterate over lines from multiple input streams

Concurrent Execution

threading: Thread-based parallelism
multiprocessing: Process-based parallelism
multiprocessing.shared_memory: Shared memory for direct access across processes
concurrent.futures: Launching parallel tasks
subprocess: Subprocess management
sched: Event scheduler
queue: A synchronized queue class
contextvars: Context Variables
_thread: Low-level threading API

Networking and Interprocess Communication

asyncio: Asynchronous I/O
socket: Low-level networking interface
ssl: TLS/SSL wrapper for socket objects

Internet Data Handling

email: An email and MIME handling package
json: JSON encoder and decoder
mailbox: Manipulate mailboxes in various formats
mimetypes: Map filenames to MIME types
base64: Base16, Base32, Base64, Base85 Data Encodings
binascii: Convert between binary and ASCII
quopri: Encode and decode MIME quoted-printable data

Structured Markup Processing Tools

html: HyperText Markup Language support
html.parser: Simple HTML and XHTML parser
html.entities: Definitions of HTML general entities
XML Processing Modules
- xml.etree.ElementTree: The ElementTree XML API

Internet Protocols and Support

webbrowser: Convenient web-browser controller
wsgiref: WSGI Utilities and Reference Implementation
urllib: URL handling modules
urllib.request: Extensible library for opening URLs
urllib.response: Response classes used by urllib
urllib.parse: Parse URLs into components
urllib.error: Exception classes raised by urllib.request
urllib.robotparser: Parser for robots.txt
http: HTTP modules
http.client: HTTP protocol client
ftplib: FTP protocol client
uuid: UUID objects according to RFC 4122
socketserver: A framework for network servers
http.server: HTTP servers
http.cookies: HTTP state management
http.cookiejar: Cookie handling for HTTP clients
xmlrpc: XMLRPC server and client modules
xmlrpc.client: XML-RPC client access
xmlrpc.server: Basic XML-RPC servers
ipaddress: IPv4/IPv6 manipulation library

Multimedia Services

Internationalization

Program Frameworks

cmd: Support for line-oriented command interpreters
shlex: Simple lexical analysis

Graphical User Interfaces with Tk

Development Tools

typing: Support for type hints
pydoc: Documentation generator and online help system

Debugging and Profiling

Software Packaging and Distribution

ensurepip: Bootstrapping the pip installer
venv: Creation of virtual environments
zipapp: Manage executable Python zip archives

Python Runtime Services

sys: System-specific parameters and functions
sys.monitoring: Execution event monitoring
sysconfig: Provide access to Python’s configuration information
builtins: Built-in objects
main: Top-level code environment
warnings: Warning control
dataclasses: Data Classes
contextlib: Utilities for with-statement contexts
abc: Abstract Base Classes
atexit: Exit handlers
traceback: Print or retrieve a stack traceback
future: Future statement definitions
gc: Garbage Collector interface
inspect: Inspect live objects
site: Site-specific configuration hook

Custom Python Interpreters

Importing Modules

zipimport: Import modules from Zip archives
pkgutil: Package extension utility
modulefinder: Find modules used by a script
runpy: Locating and executing Python modules
importlib: The implementation of import
importlib.resources: Package resource reading, opening and access
importlib.resources.abc: Abstract base classes for resources
importlib.metadata: Accessing package metadata
The initialization of the sys.path module search path

Python Language Services

MS Windows Specific Services

Unix Specific Services

Modules command-line interface (CLI)

Environment/Dependency Management

uv

Installing

curl -LsSf https://astral.sh/uv/install.sh | sh
wget -qO- https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
pip install uv
winget install --id=astral-sh.uv  -e

Testing and Code Analysis Tools

assertpy
- strings
- numbers
- lists
- tuples
- dicts
- sets
- booleans
- none
- dates
- files
- objects
- failure
- soft assertions
pytest
pytest-cov
- reporting
mypy
ruff
- Rules
pylint
pyflakes
flake8
pycodestyle
bandit
checkov

Sample Patterns

Command-line Argument Parsing / Runnable Executable

pyproject.toml

main.py

Test for List of Elements in a List

Multithreaded Processor

Multithreaded Processor using ThreadPoolExecutor

Requests Session with Larger Thread Pool

S3 Client with Larger Thread Pool

Python

Guides

Core Python Documentation

Environment/Dependency Management

uv

Testing and Code Analysis Tools

Templating

Database, Data Frames, and Data Modeling

3rd Party Libraries

Misc