Python
Sample Patterns
Command-line Argument Parsing / Runnable Executable
- Uses argparse
- Runner function for use in pyproject.toml [project.scripts] section
pyproject.toml
[project.scripts]
main = "main:run"
main.py
class MainKwargs[TypedDict]:
config_file: str
output_path: str
loglevel: str
filter: list[str]
def main(**kwargs: Unpack[MainKwargs]):
...
def run() -> None:
"""
Runner Function
"""
parser = argparse.ArgumentParser(prog='ProgramName',
description='What the program does',
epilog='Text at the bottom of help')
parser.add_argument('-c', '--config_file', required=True, help="Config file path")
parser.add_argument('-o', '--output_path', default=f'output-{datetime.now():%Y%m%d%H%M%S}.txt'
help="Output directory for the generated files")
parser.add_argument('-l', '--loglevel',
help='Specifies the level of verbosity for logging.',
choices=['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'],
default='INFO')
parser.add_argument('-f', '--filter', required=False,
help='Filter the sources to run against.', nargs='*', default=[])
args = parser.parse_args()
main(config_file=args.config_file,
output_path=args.output_path,
source_path=args.loglevel,
filter=args.filter)
if __name__ == '__main__':
run()
Test for List of Elements in a List
- Uses a generator expression inside the
any()orall()functions.
any(x in test_list for x in list_of_values)
all(x in test_list for x in list_of_values)
Multithreaded Processor
- Creates the worker threads as Thread objects.
- Uses Queue objects to get data into and out of the worker threads – a work queue for the inputs and a result queue for the outputs.
- Note: Boto3 Sessions are not threadsafe!
import logging
import threading
from queue import Queue
def worker(work_queue: Queue[Any], result_queue: Queue[Any]) -> None:
logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
while not work_queue.empty():
work_payload = work_queue.get()
# all incoming data needs to be in queue object
# do something to populate work_result
logger.info("Info about this thread's work: %s", something)
result_queue.put(work_result)
def main():
logger = logging.getLogger(__name__)
# ...
payloads: list[Any] = # some data
work_queue = Queue()
for payload in payloads:
work_queue.put(payload)
result_queue: Queue[dict] = Queue()
threads = []
for thread in range(0, max_threads):
logger.info('Creating thread')
thread = threading.Thread(target=worker, args=(work_queue, result_queue))
threads.append(thread)
thread.start()
logger.info('Waiting for workers to complete.')
for thread in threads:
thread.join()
results = []
logger.info('Collecting results.')
while not result_queue.empty():
results.append(result_queue.get())
Multithreaded Processor using ThreadPoolExecutor
- The worker function is written like a normal function with normal inputs and returning its result.
- Uses concurrent.futures.ThreadPoolExecutor to handle creating the threads, getting parameters to the function, and getting results back to the caller.
- Submitting a payload returns a Future object which is similar to a JavaScript Promise.
- Iterate through the collection of future objects with concurrent.futures.as_completed to get the objects that are completed as they complete.
- Note: Boto3 Sessions are not threadsafe!
import concurrent.futures
import logging
def worker(payload: Any, *args, **kwargs) -> Any:
logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
# use args/kwargs like a normal funciton
# do something to populate work_result
logger.info("Info about this thread's work: %s", something)
return work_result
def main():
logger = logging.getLogger(__name__)
# ...
payloads: list[Any] = # some data
results = []
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
result_futures = []
for payload in payloads:
# Submit the payload to the executor and append the resulting "future" to a list.
result_futures.append(executor.submit(worker, payload, *args, **kwargs))
# Iterate through the "futures" until they're complete and append the results to a list.
for future in concurrent.futures.as_completed(result_futures):
results.append(future.result())
Requests Session with Larger Thread Pool
from requests import Session
from requests.adapters import HTTPAdapter
MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection
with Session() as session:
adapter = HTTPAdapter(pool_connections=MAX_POOL_SIZE, pool_maxsize=MAX_POOL_SIZE)
session.mount("https://", adapter)
session.mount("http://", adapter)
# ...
S3 Client with Larger Thread Pool
from boto3 import Session
from botocore.config import Config
MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection
client = Session().client('s3', config=Config(max_pool_connections=MAX_POOL_SIZE))
Python
Guides
Core Python Documentation
- Status of Python Versions
- Python 3 Documentation (python.org)
- Python Module Index
- The Python Language Reference
The Python Standard Library
Built-in Functions
Built-in Constants
Built-in Exceptions
Built-in Types
Special Method Names Dunder Method Cheat Sheet
- boolean operations
- comparisons
- Boolean Type
- bool: boolean
- Numeric Types
- Iterator Types
- Sequence Types
- list: mutable sequences typically of homogenous data
- tuple: an immutable sequences
- range: an immutable sequence of numbers
- Text Sequence Type
- Binary Sequence Types
- bytes: immutable sequences of single bytes
- bytearray: mutable sequences of single bytes
- memoryview: objects that allow Python code to access the internal data of an object that supports the buffer protocol without copying.
- Set Types
- Mapping Types: currently only dict
- dict: maps hashable values to arbitrary objects
- Context Manager Types
Text Processing Services
- string: Common string operations
- re: Regular expression operations
- difflib: Helpers for computing deltas
- textwrap: Text wrapping and filling
- unicodedata: Unicode Character Database
- stringprep: Internet String Preparation
Binary Data Services
Data Types
- datetime: Basic date and time types
- zoneinfo: IANA time zone support
- calendar: General calendar-related functions
- collections: Container datatypes
- collections.abc: Abstract Base Classes for Containers
- heapq: Heap queue algorithm
- bisect: Array bisection algorithm
- array: Efficient arrays of numeric values
- weakref: Weak references
- types: Dynamic type creation and names for built-in types
- copy: Shallow and deep copy operations
- pprint: Data pretty printer
- reprlib: Alternate repr() implementation
- enum: Support for enumerations
- graphlib: Functionality to operate with graph-like structures
Numeric and Mathematical Modules
- numbers: Numeric abstract base classes
- math: Mathematical functions
- cmath: Mathematical functions for complex numbers
- decimal: Decimal fixed-point and floating-point arithmetic
- fractions: Rational numbers
- random: Generate pseudo-random numbers
- statistics: Mathematical statistics functions
Functional Programming Modules
- itertools: Functions creating iterators for efficient looping
- functools: Higher-order functions and operations on callable objects
- operator: Standard operators as functions
File and Directory Access
- pathlib: Object-oriented filesystem paths
- os.path: Common pathname manipulations
- stat: Interpreting stat() results
- filecmp: File and Directory Comparisons
- tempfile: Generate temporary files and directories
- glob: Unix style pathname pattern expansion
- fnmatch: Unix filename pattern matching
- linecache: Random access to text lines
- shutil: High-level file operations
Data Persistence
- pickle: Python object serialization
- copyreg: Register pickle support functions
- shelve: Python object persistence
- marshal: Internal Python object serialization
- sqlite3: DB-API 2.0 interface for SQLite databases
Data Compression and Archiving
- zlib: Compression compatible with gzip
- gzip: Support for gzip files
- bz2: Support for bzip2 compression
- lzma: Compression using the LZMA algorithm
- zipfile: Work with ZIP archives
- tarfile: Read and write tar archive files
File Formats
- csv: CSV File Reading and Writing
- configparser: Configuration file parser
- tomllib: Parse TOML files
- netrc: netrc file processing
Cryptographic Services
- hashlib: Secure hashes and message digests
- hmac: Keyed-Hashing for Message Authentication
- secrets: Generate secure random numbers for managing secrets
Generic Operating System Services
- os: Miscellaneous operating system interfaces
- io: Core tools for working with streams
- time: Time access and conversions
- logging: Logging facility for Python
- logging.config: Logging configuration
- logging.handlers: Logging handlers
- platform: Access to underlying platform’s identifying data
- errno: Standard errno system symbols
- ctypes: A foreign function library for Python
Command Line Interface Libraries
- argparse: Parser for command-line options, arguments and subcommands
- optparse: Parser for command line options
- getpass: Portable password input
- fileinput: Iterate over lines from multiple input streams
Concurrent Execution
- threading: Thread-based parallelism
- multiprocessing: Process-based parallelism
- multiprocessing.shared_memory: Shared memory for direct access across processes
- concurrent.futures: Launching parallel tasks
- subprocess: Subprocess management
- sched: Event scheduler
- queue: A synchronized queue class
- contextvars: Context Variables
- _thread: Low-level threading API
Networking and Interprocess Communication
- asyncio: Asynchronous I/O
- socket: Low-level networking interface
- ssl: TLS/SSL wrapper for socket objects
Internet Data Handling
- email: An email and MIME handling package
- json: JSON encoder and decoder
- mailbox: Manipulate mailboxes in various formats
- mimetypes: Map filenames to MIME types
- base64: Base16, Base32, Base64, Base85 Data Encodings
- binascii: Convert between binary and ASCII
- quopri: Encode and decode MIME quoted-printable data
Structured Markup Processing Tools
- html: HyperText Markup Language support
- html.parser: Simple HTML and XHTML parser
- html.entities: Definitions of HTML general entities
- XML Processing Modules
- xml.etree.ElementTree: The ElementTree XML API
Internet Protocols and Support
- webbrowser: Convenient web-browser controller
- wsgiref: WSGI Utilities and Reference Implementation
- urllib: URL handling modules
- urllib.request: Extensible library for opening URLs
- urllib.response: Response classes used by urllib
- urllib.parse: Parse URLs into components
- urllib.error: Exception classes raised by urllib.request
- urllib.robotparser: Parser for robots.txt
- http: HTTP modules
- http.client: HTTP protocol client
- ftplib: FTP protocol client
- uuid: UUID objects according to RFC 4122
- socketserver: A framework for network servers
- http.server: HTTP servers
- http.cookies: HTTP state management
- http.cookiejar: Cookie handling for HTTP clients
- xmlrpc: XMLRPC server and client modules
- xmlrpc.client: XML-RPC client access
- xmlrpc.server: Basic XML-RPC servers
- ipaddress: IPv4/IPv6 manipulation library
Multimedia Services
Internationalization
Program Frameworks
Graphical User Interfaces with Tk
Development Tools
Debugging and Profiling
Software Packaging and Distribution
- ensurepip: Bootstrapping the pip installer
- venv: Creation of virtual environments
- zipapp: Manage executable Python zip archives
Python Runtime Services
- sys: System-specific parameters and functions
- sys.monitoring: Execution event monitoring
- sysconfig: Provide access to Python’s configuration information
- builtins: Built-in objects
- main: Top-level code environment
- warnings: Warning control
- dataclasses: Data Classes
- contextlib: Utilities for with-statement contexts
- abc: Abstract Base Classes
- atexit: Exit handlers
- traceback: Print or retrieve a stack traceback
- future: Future statement definitions
- gc: Garbage Collector interface
- inspect: Inspect live objects
- site: Site-specific configuration hook
Custom Python Interpreters
Importing Modules
- zipimport: Import modules from Zip archives
- pkgutil: Package extension utility
- modulefinder: Find modules used by a script
- runpy: Locating and executing Python modules
- importlib: The implementation of import
- importlib.resources: Package resource reading, opening and access
- importlib.resources.abc: Abstract base classes for resources
- importlib.metadata: Accessing package metadata
- The initialization of the sys.path module search path
Python Language Services
MS Windows Specific Services
Unix Specific Services
Modules command-line interface (CLI)
Environment/Dependency Management
uv
Installing
curl -LsSf https://astral.sh/uv/install.sh | sh
wget -qO- https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
pip install uv
winget install --id=astral-sh.uv -e