Work Notes

Sample Patterns

Command-line Argument Parsing / Runnable Executable

  • Uses argparse
  • Runner function for use in pyproject.toml [project.scripts] section

pyproject.toml

[project.scripts]
main = "main:run"

main.py

class MainKwargs[TypedDict]:
  config_file: str
  output_path: str
  loglevel: str
  filter: list[str]


def main(**kwargs: Unpack[MainKwargs]):
  ...


def run() -> None:
    """
    Runner Function
    """
    parser = argparse.ArgumentParser(prog='ProgramName',
                                     description='What the program does',
                                     epilog='Text at the bottom of help')
    parser.add_argument('-c', '--config_file', required=True, help="Config file path")
    parser.add_argument('-o', '--output_path', default=f'output-{datetime.now():%Y%m%d%H%M%S}.txt'
                        help="Output directory for the generated files")
    parser.add_argument('-l', '--loglevel',
                        help='Specifies the level of verbosity for logging.',
                        choices=['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'],
                        default='INFO')
    parser.add_argument('-f', '--filter', required=False,
                        help='Filter the sources to run against.', nargs='*', default=[])
    args = parser.parse_args()

    main(config_file=args.config_file,
         output_path=args.output_path,
         source_path=args.loglevel,
         filter=args.filter)


if __name__ == '__main__':
    run()

Test for List of Elements in a List

  • Uses a generator expression inside the any() or all() functions.
any(x in test_list for x in list_of_values)
all(x in test_list for x in list_of_values)

Multithreaded Processor

  • Creates the worker threads as Thread objects.
  • Uses Queue objects to get data into and out of the worker threads – a work queue for the inputs and a result queue for the outputs.
  • Note: Boto3 Sessions are not threadsafe!
import logging
import threading
from queue import Queue

def worker(work_queue: Queue[Any], result_queue: Queue[Any]) -> None:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
    while not work_queue.empty():
        work_payload = work_queue.get()

        # all incoming data needs to be in queue object
        # do something to populate work_result
        logger.info("Info about this thread's work: %s", something)

        result_queue.put(work_result)

def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data

    work_queue = Queue()
    for payload in payloads:
        work_queue.put(payload)

    result_queue: Queue[dict] = Queue()
    threads = []
    for thread in range(0, max_threads):
        logger.info('Creating thread')
        thread = threading.Thread(target=worker, args=(work_queue, result_queue))
        threads.append(thread)
        thread.start()

    logger.info('Waiting for workers to complete.')
    for thread in threads:
        thread.join()

    results = []
    logger.info('Collecting results.')
    while not result_queue.empty():
        results.append(result_queue.get())

Multithreaded Processor using ThreadPoolExecutor

  • The worker function is written like a normal function with normal inputs and returning its result.
  • Uses concurrent.futures.ThreadPoolExecutor to handle creating the threads, getting parameters to the function, and getting results back to the caller.
  • Submitting a payload returns a Future object which is similar to a JavaScript Promise.
  • Iterate through the collection of future objects with concurrent.futures.as_completed to get the objects that are completed as they complete.
  • Note: Boto3 Sessions are not threadsafe!
import concurrent.futures
import logging


def worker(payload: Any, *args, **kwargs) -> Any:
    logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')

    # use args/kwargs like a normal funciton
    # do something to populate work_result
    logger.info("Info about this thread's work: %s", something)

    return work_result


def main():
    logger = logging.getLogger(__name__)

    # ...
    payloads: list[Any] = # some data
    results = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
        result_futures = []
        for payload in payloads:
            # Submit the payload to the executor and append the resulting "future" to a list.
            result_futures.append(executor.submit(worker, payload, *args, **kwargs))
            
        # Iterate through the "futures" until they're complete and append the results to a list.
        for future in concurrent.futures.as_completed(result_futures):
            results.append(future.result())

Requests Session with Larger Thread Pool

from requests import Session
from requests.adapters import HTTPAdapter

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

with Session() as session:
    adapter = HTTPAdapter(pool_connections=MAX_POOL_SIZE, pool_maxsize=MAX_POOL_SIZE)
    session.mount("https://", adapter)
    session.mount("http://", adapter)

    # ...

S3 Client with Larger Thread Pool

from boto3 import Session
from botocore.config import Config

MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection

client = Session().client('s3', config=Config(max_pool_connections=MAX_POOL_SIZE))

Compound Statements

if

if [condition]: 
    [block to run if condition is true; skip the rest]
elif [condition]: 
    [block to run if condition is true; skip the rest]
else: 
    [block to run if no condition was true]

while

while [condition]:
    [block repeated while condition remans true]
else:
    [executed after block is false]
  • continue will terminate the block and skip to the next repetition
  • break will terminate the entire loop without running “else”

for

for [target list] in [iterable expression]:
    [block repeated for each item yielded from iterable expression using target list]
else:
    [block to run after the iterable is consumed]
  • continue terminates the block and skips to the next iteration
  • break terminates the entire loop without running “else”

try

try:
    [block]
except[*] [expression] [as [identifier]]:
    [block to handle exception]
else:
    [block to run if no exception occurs]
finally:
    [block to run after try/except/else]
  • An expression-less except clause, if present, must be last; it matches any exception.
  • except* clause(s) specify one or more handlers for groups of exceptions (BaseExceptionGroup instances).
  • except and except* can’t be mixed.

with

with [context_manager] as [target]:
    [block]
  • “Compound with blocks” consist of multiple [expression as target] in a tuple (see below)
  • The context manager’s __exit__() method is invoked regardless of whether an exception occurs in the context manager.

is semantically equivalent to:

context_manager = (EXPRESSION)
enter = type(context_manager).__enter__
exit = type(context_manager).__exit__
value = enter(context_manager)
hit_except = False

try:
    TARGET = value
    [block]
except:
    hit_except = True
    if not exit(context_manager, *sys.exc_info()):
        raise
finally:
    if not hit_except:
        exit(context_manager, None, None, None)

Compound with:

with A() as a, B() as b:
    [block]

is semantically equivalent to:

with A() as a:
    with B() as b:
        [block]

match

Python

Check available package versions:

uvx pip index versions <package>

Guides

Core Python Documentation

The Python Standard Library

Built-in Functions

Built-in Constants

Built-in Exceptions

Built-in Types

Special Method Names Dunder Method Cheat Sheet

Text Processing Services

  • string: Common string operations
  • re: Regular expression operations
  • difflib: Helpers for computing deltas
  • textwrap: Text wrapping and filling
  • unicodedata: Unicode Character Database
  • stringprep: Internet String Preparation

Binary Data Services

  • struct: Interpret bytes as packed binary data
  • codecs: Codec registry and base classes

Data Types

Numeric and Mathematical Modules

  • numbers: Numeric abstract base classes
  • math: Mathematical functions
  • cmath: Mathematical functions for complex numbers
  • decimal: Decimal fixed-point and floating-point arithmetic
  • fractions: Rational numbers
  • random: Generate pseudo-random numbers
  • statistics: Mathematical statistics functions

Functional Programming Modules

  • itertools: Functions creating iterators for efficient looping
  • functools: Higher-order functions and operations on callable objects
  • operator: Standard operators as functions

File and Directory Access

Data Persistence

  • pickle: Python object serialization
  • copyreg: Register pickle support functions
  • shelve: Python object persistence
  • marshal: Internal Python object serialization
  • sqlite3: DB-API 2.0 interface for SQLite databases

Data Compression and Archiving

  • zlib: Compression compatible with gzip
  • gzip: Support for gzip files
  • bz2: Support for bzip2 compression
  • lzma: Compression using the LZMA algorithm
  • zipfile: Work with ZIP archives
  • tarfile: Read and write tar archive files

File Formats

Cryptographic Services

  • hashlib: Secure hashes and message digests
  • hmac: Keyed-Hashing for Message Authentication
  • secrets: Generate secure random numbers for managing secrets

Generic Operating System Services

  • os: Miscellaneous operating system interfaces
  • io: Core tools for working with streams
  • time: Time access and conversions
  • logging: Logging facility for Python
  • logging.config: Logging configuration
  • logging.handlers: Logging handlers
  • platform: Access to underlying platform’s identifying data
  • errno: Standard errno system symbols
  • ctypes: A foreign function library for Python

Command Line Interface Libraries

  • argparse: Parser for command-line options, arguments and subcommands
  • optparse: Parser for command line options
  • getpass: Portable password input
  • fileinput: Iterate over lines from multiple input streams

Concurrent Execution

Networking and Interprocess Communication

  • asyncio: Asynchronous I/O
  • socket: Low-level networking interface
  • ssl: TLS/SSL wrapper for socket objects

Internet Data Handling

  • email: An email and MIME handling package
  • json: JSON encoder and decoder
  • mailbox: Manipulate mailboxes in various formats
  • mimetypes: Map filenames to MIME types
  • base64: Base16, Base32, Base64, Base85 Data Encodings
  • binascii: Convert between binary and ASCII
  • quopri: Encode and decode MIME quoted-printable data

Structured Markup Processing Tools

Internet Protocols and Support

Multimedia Services

Internationalization

Program Frameworks

  • cmd: Support for line-oriented command interpreters
  • shlex: Simple lexical analysis

Graphical User Interfaces with Tk

Development Tools

  • typing: Support for type hints
  • pydoc: Documentation generator and online help system

Debugging and Profiling

Software Packaging and Distribution

  • ensurepip: Bootstrapping the pip installer
  • venv: Creation of virtual environments
  • zipapp: Manage executable Python zip archives

Python Runtime Services

  • sys: System-specific parameters and functions
  • sys.monitoring: Execution event monitoring
  • sysconfig: Provide access to Python’s configuration information
  • builtins: Built-in objects
  • main: Top-level code environment
  • warnings: Warning control
  • dataclasses: Data Classes
  • contextlib: Utilities for with-statement contexts
  • abc: Abstract Base Classes
  • atexit: Exit handlers
  • traceback: Print or retrieve a stack traceback
  • future: Future statement definitions
  • gc: Garbage Collector interface
  • inspect: Inspect live objects
  • site: Site-specific configuration hook

Custom Python Interpreters

Importing Modules

Python Language Services

MS Windows Specific Services

Unix Specific Services

Modules command-line interface (CLI)

Environment/Dependency Management

uv

Installing

curl -LsSf https://astral.sh/uv/install.sh | sh
wget -qO- https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
pip install uv
winget install --id=astral-sh.uv  -e

Testing and Code Analysis Tools

Templating

Database, Data Frames, and Data Modeling

3rd Party Libraries

Misc

UV Cheat Sheet

Check available package versions

pip index is not supported by the uv pip command. The workaround is running pip index via uvx:

uvx pip index versions <package>

Update a dependent package without adding it explicity

The froezn environment is built from the lockfile, so you an sync a dependency to its latest version:

uv sync --upgrade-package <package>

You may have to run uv sync --all-groups to re-add any git-based packages.

Version management

Building and publishing a package / updating versions

The –bump option supports the following common version components: major, minor, patch, stable, alpha, beta, rc, post, and dev.

Jupyter Lab/Notebook

uv run --with jupyter jupyter lab