Sample Patterns

Command-line Argument Parsing / Runnable Executable

  • Uses argparse
  • Runner function for use in pyproject.toml [project.scripts] section

pyproject.toml

[project.scripts]
main = "main:run"

main.py

class MainKwargs[TypedDict]:
  config_file: str
  output_path: str
  loglevel: str
  filter: list[str]


def main(**kwargs: Unpack[MainKwargs]):
  ...


def run() -> None:
    """
    Runner Function
    """
    parser = argparse.ArgumentParser(prog='ProgramName',
                                     description='What the program does',
                                     epilog='Text at the bottom of help')
    parser.add_argument('-c', '--config_file', required=True, help="Config file path")
    parser.add_argument('-o', '--output_path', default=f'output-{datetime.now():%Y%m%d%H%M%S}.txt'
                        help="Output directory for the generated files")
    parser.add_argument('-l', '--loglevel',
                        help='Specifies the level of verbosity for logging.',
                        choices=['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'],
                        default='INFO')
    parser.add_argument('-f', '--filter', required=False,
                        help='Filter the sources to run against.', nargs='*', default=[])
    args = parser.parse_args()

    main(config_file=args.config_file,
         output_path=args.output_path,
         source_path=args.loglevel,
         filter=args.filter)


if __name__ == '__main__':
    run()

Test for List of Elements in a List

  • Uses a generator expression inside the any() or all() functions.
any(x in test_list for x in list_of_values)
all(x in test_list for x in list_of_values)

Multithreaded Processor

  • Creates the worker threads as Thread objects.
  • Uses Queue objects to get data into and out of the worker threads – a work queue for the inputs and a result queue for the outputs.
  • Note: Boto3 Sessions are not threadsafe!
import logging
import threading
from queue import Queue

def worker(work_queue: Queue[Any], result_queue: Queue[Any]) -> None:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
    while not work_queue.empty():
        work_payload = work_queue.get()

        # all incoming data needs to be in queue object
        # do something to populate work_result
        logger.info("Info about this thread's work: %s", something)

        result_queue.put(work_result)

def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data

    work_queue = Queue()
    for payload in payloads:
        work_queue.put(payload)

    result_queue: Queue[dict] = Queue()
    threads = []
    for thread in range(0, max_threads):
        logger.info('Creating thread')
        thread = threading.Thread(target=worker, args=(work_queue, result_queue))
        threads.append(thread)
        thread.start()

    logger.info('Waiting for workers to complete.')
    for thread in threads:
        thread.join()

    results = []
    logger.info('Collecting results.')
    while not result_queue.empty():
        results.append(result_queue.get())

Multithreaded Processor using ThreadPoolExecutor

  • The worker function is written like a normal function with normal inputs and returning its result.
  • Uses concurrent.futures.ThreadPoolExecutor to handle creating the threads, getting parameters to the function, and getting results back to the caller.
  • Submitting a payload returns a Future object which is similar to a JavaScript Promise.
  • Iterate through the collection of future objects with concurrent.futures.as_completed to get the objects that are completed as they complete.
  • Note: Boto3 Sessions are not threadsafe!
import concurrent.futures
import logging


def worker(payload: Any, *args, **kwargs) -> Any:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')

    # use args/kwargs like a normal funciton
    # do something to populate work_result
    logger.info("Info about this thread's work: %s", something)

    return work_result


def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data
    results = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
        result_futures = []
        for payload in payloads:
            # Submit the payload to the executor and append the resulting "future" to a list.
            result_futures.append(executor.submit(worker, payload, *args, **kwargs))
            
        # Iterate through the "futures" until they're complete and append the results to a list.
        for future in concurrent.futures.as_completed(result_futures):
            results.append(future.result())

Requests Session with Larger Thread Pool

from requests import Session
from requests.adapters import HTTPAdapter

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

with Session() as session:
    adapter = HTTPAdapter(pool_connections=MAX_POOL_SIZE, pool_maxsize=MAX_POOL_SIZE)
    session.mount("https://", adapter)
    session.mount("http://", adapter)

    # ...

S3 Client with Larger Thread Pool

from boto3 import Session
from botocore.config import Config

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

client = Session().client('s3', config=Config(max_pool_connections=MAX_POOL_SIZE))

Python

Guides

Core Python Documentation

The Python Standard Library

Built-in Functions

Built-in Constants

Built-in Exceptions

Built-in Types

Special Method Names Dunder Method Cheat Sheet

Text Processing Services

  • string: Common string operations
  • re: Regular expression operations
  • difflib: Helpers for computing deltas
  • textwrap: Text wrapping and filling
  • unicodedata: Unicode Character Database
  • stringprep: Internet String Preparation

Binary Data Services

  • struct: Interpret bytes as packed binary data
  • codecs: Codec registry and base classes

Data Types

Numeric and Mathematical Modules

  • numbers: Numeric abstract base classes
  • math: Mathematical functions
  • cmath: Mathematical functions for complex numbers
  • decimal: Decimal fixed-point and floating-point arithmetic
  • fractions: Rational numbers
  • random: Generate pseudo-random numbers
  • statistics: Mathematical statistics functions

Functional Programming Modules

  • itertools: Functions creating iterators for efficient looping
  • functools: Higher-order functions and operations on callable objects
  • operator: Standard operators as functions

File and Directory Access

Data Persistence

  • pickle: Python object serialization
  • copyreg: Register pickle support functions
  • shelve: Python object persistence
  • marshal: Internal Python object serialization
  • sqlite3: DB-API 2.0 interface for SQLite databases

Data Compression and Archiving

  • zlib: Compression compatible with gzip
  • gzip: Support for gzip files
  • bz2: Support for bzip2 compression
  • lzma: Compression using the LZMA algorithm
  • zipfile: Work with ZIP archives
  • tarfile: Read and write tar archive files

File Formats

Cryptographic Services

  • hashlib: Secure hashes and message digests
  • hmac: Keyed-Hashing for Message Authentication
  • secrets: Generate secure random numbers for managing secrets

Generic Operating System Services

  • os: Miscellaneous operating system interfaces
  • io: Core tools for working with streams
  • time: Time access and conversions
  • logging: Logging facility for Python
  • logging.config: Logging configuration
  • logging.handlers: Logging handlers
  • platform: Access to underlying platform’s identifying data
  • errno: Standard errno system symbols
  • ctypes: A foreign function library for Python

Command Line Interface Libraries

  • argparse: Parser for command-line options, arguments and subcommands
  • optparse: Parser for command line options
  • getpass: Portable password input
  • fileinput: Iterate over lines from multiple input streams

Concurrent Execution

Networking and Interprocess Communication

  • asyncio: Asynchronous I/O
  • socket: Low-level networking interface
  • ssl: TLS/SSL wrapper for socket objects

Internet Data Handling

  • email: An email and MIME handling package
  • json: JSON encoder and decoder
  • mailbox: Manipulate mailboxes in various formats
  • mimetypes: Map filenames to MIME types
  • base64: Base16, Base32, Base64, Base85 Data Encodings
  • binascii: Convert between binary and ASCII
  • quopri: Encode and decode MIME quoted-printable data

Structured Markup Processing Tools

Internet Protocols and Support

Multimedia Services

Internationalization

Program Frameworks

  • cmd: Support for line-oriented command interpreters
  • shlex: Simple lexical analysis

Graphical User Interfaces with Tk

Development Tools

  • typing: Support for type hints
  • pydoc: Documentation generator and online help system

Debugging and Profiling

Software Packaging and Distribution

  • ensurepip: Bootstrapping the pip installer
  • venv: Creation of virtual environments
  • zipapp: Manage executable Python zip archives

Python Runtime Services

  • sys: System-specific parameters and functions
  • sys.monitoring: Execution event monitoring
  • sysconfig: Provide access to Python’s configuration information
  • builtins: Built-in objects
  • main: Top-level code environment
  • warnings: Warning control
  • dataclasses: Data Classes
  • contextlib: Utilities for with-statement contexts
  • abc: Abstract Base Classes
  • atexit: Exit handlers
  • traceback: Print or retrieve a stack traceback
  • future: Future statement definitions
  • gc: Garbage Collector interface
  • inspect: Inspect live objects
  • site: Site-specific configuration hook

Custom Python Interpreters

Importing Modules

Python Language Services

MS Windows Specific Services

Unix Specific Services

Modules command-line interface (CLI)

Environment/Dependency Management

uv

Installing

curl -LsSf https://astral.sh/uv/install.sh | sh
wget -qO- https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
pip install uv
winget install --id=astral-sh.uv  -e

Testing and Code Analysis Tools

Templating

Database, Data Frames, and Data Modeling

3rd Party Libraries

Misc