Python
Work Notes
Sample Patterns
Command-line Argument Parsing / Runnable Executable
- Uses argparse
- Runner function for use in pyproject.toml [project.scripts] section
pyproject.toml
[project.scripts]
main = "main:run"
main.py
class MainKwargs[TypedDict]:
config_file: str
output_path: str
loglevel: str
filter: list[str]
def main(**kwargs: Unpack[MainKwargs]):
...
def run() -> None:
"""
Runner Function
"""
parser = argparse.ArgumentParser(prog='ProgramName',
description='What the program does',
epilog='Text at the bottom of help')
parser.add_argument('-c', '--config_file', required=True, help="Config file path")
parser.add_argument('-o', '--output_path', default=f'output-{datetime.now():%Y%m%d%H%M%S}.txt'
help="Output directory for the generated files")
parser.add_argument('-l', '--loglevel',
help='Specifies the level of verbosity for logging.',
choices=['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'],
default='INFO')
parser.add_argument('-f', '--filter', required=False,
help='Filter the sources to run against.', nargs='*', default=[])
args = parser.parse_args()
main(config_file=args.config_file,
output_path=args.output_path,
source_path=args.loglevel,
filter=args.filter)
if __name__ == '__main__':
run()
Test for List of Elements in a List
- Uses a generator expression inside the
any()orall()functions.
any(x in test_list for x in list_of_values)
all(x in test_list for x in list_of_values)
Multithreaded Processor
- Creates the worker threads as Thread objects.
- Uses Queue objects to get data into and out of the worker threads – a work queue for the inputs and a result queue for the outputs.
- Note: Boto3 Sessions are not threadsafe!
import logging
import threading
from queue import Queue
def worker(work_queue: Queue[Any], result_queue: Queue[Any]) -> None:
logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
while not work_queue.empty():
work_payload = work_queue.get()
# all incoming data needs to be in queue object
# do something to populate work_result
logger.info("Info about this thread's work: %s", something)
result_queue.put(work_result)
def main():
logger = logging.getLogger(__name__)
# ...
payloads: list[Any] = # some data
work_queue = Queue()
for payload in payloads:
work_queue.put(payload)
result_queue: Queue[dict] = Queue()
threads = []
for thread in range(0, max_threads):
logger.info('Creating thread')
thread = threading.Thread(target=worker, args=(work_queue, result_queue))
threads.append(thread)
thread.start()
logger.info('Waiting for workers to complete.')
for thread in threads:
thread.join()
results = []
logger.info('Collecting results.')
while not result_queue.empty():
results.append(result_queue.get())
Multithreaded Processor using ThreadPoolExecutor
- The worker function is written like a normal function with normal inputs and returning its result.
- Uses concurrent.futures.ThreadPoolExecutor to handle creating the threads, getting parameters to the function, and getting results back to the caller.
- Submitting a payload returns a Future object which is similar to a JavaScript Promise.
- Iterate through the collection of future objects with concurrent.futures.as_completed to get the objects that are completed as they complete.
- Note: Boto3 Sessions are not threadsafe!
import concurrent.futures
import logging
def worker(payload: Any, *args, **kwargs) -> Any:
logger = logging.getLogger(__name__).getChild(f'thread {threading.current_thread()}')
# use args/kwargs like a normal funciton
# do something to populate work_result
logger.info("Info about this thread's work: %s", something)
return work_result
def main():
logger = logging.getLogger(__name__)
# ...
payloads: list[Any] = # some data
results = []
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
result_futures = []
for payload in payloads:
# Submit the payload to the executor and append the resulting "future" to a list.
result_futures.append(executor.submit(worker, payload, *args, **kwargs))
# Iterate through the "futures" until they're complete and append the results to a list.
for future in concurrent.futures.as_completed(result_futures):
results.append(future.result())
Requests Session with Larger Thread Pool
from requests import Session
from requests.adapters import HTTPAdapter
MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection
with Session() as session:
adapter = HTTPAdapter(pool_connections=MAX_POOL_SIZE, pool_maxsize=MAX_POOL_SIZE)
session.mount("https://", adapter)
session.mount("http://", adapter)
# ...
S3 Client with Larger Thread Pool
from boto3 import Session
from botocore.config import Config
MAX_THREADS = 20
MAX_POOL_SIZE = 2 * MAX_THREADS
# The pool size should be double the thread count so that each thread can get a new connection
client = Session().client('s3', config=Config(max_pool_connections=MAX_POOL_SIZE))
Compound Statements
if
if [condition]:
[block to run if condition is true; skip the rest]
elif [condition]:
[block to run if condition is true; skip the rest]
else:
[block to run if no condition was true]
while
while [condition]:
[block repeated while condition remans true]
else:
[executed after block is false]
continuewill terminate the block and skip to the next repetitionbreakwill terminate the entire loop without running “else”
for
for [target list] in [iterable expression]:
[block repeated for each item yielded from iterable expression using target list]
else:
[block to run after the iterable is consumed]
continueterminates the block and skips to the next iterationbreakterminates the entire loop without running “else”
try
try:
[block]
except[*] [expression] [as [identifier]]:
[block to handle exception]
else:
[block to run if no exception occurs]
finally:
[block to run after try/except/else]
- An
expression-lessexceptclause, if present, must be last; it matches any exception. except*clause(s) specify one or more handlers for groups of exceptions (BaseExceptionGroup instances).exceptandexcept*can’t be mixed.
with
with [context_manager] as [target]:
[block]
- “Compound with blocks” consist of multiple [
expressionastarget] in a tuple (see below) - The context manager’s
__exit__()method is invoked regardless of whether an exception occurs in the context manager.
is semantically equivalent to:
context_manager = (EXPRESSION)
enter = type(context_manager).__enter__
exit = type(context_manager).__exit__
value = enter(context_manager)
hit_except = False
try:
TARGET = value
[block]
except:
hit_except = True
if not exit(context_manager, *sys.exc_info()):
raise
finally:
if not hit_except:
exit(context_manager, None, None, None)
Compound with:
with A() as a, B() as b:
[block]
is semantically equivalent to:
with A() as a:
with B() as b:
[block]
match
Python
Check available package versions:
uvx pip index versions <package>
Guides
Core Python Documentation
- Status of Python Versions
- Python 3 Documentation (python.org)
- Python Module Index
- The Python Language Reference
The Python Standard Library
Built-in Functions
Built-in Constants
Built-in Exceptions
Built-in Types
Special Method Names Dunder Method Cheat Sheet
- boolean operations
- comparisons
- Boolean Type
- bool: boolean
- Numeric Types
- Iterator Types
- Sequence Types
- list: mutable sequences typically of homogenous data
- tuple: an immutable sequences
- range: an immutable sequence of numbers
- Text Sequence Type
- Binary Sequence Types
- bytes: immutable sequences of single bytes
- bytearray: mutable sequences of single bytes
- memoryview: objects that allow Python code to access the internal data of an object that supports the buffer protocol without copying.
- Set Types
- Mapping Types: currently only dict
- dict: maps hashable values to arbitrary objects
- Context Manager Types
Text Processing Services
- string: Common string operations
- re: Regular expression operations
- difflib: Helpers for computing deltas
- textwrap: Text wrapping and filling
- unicodedata: Unicode Character Database
- stringprep: Internet String Preparation
Binary Data Services
Data Types
- datetime: Basic date and time types
- zoneinfo: IANA time zone support
- calendar: General calendar-related functions
- collections: Container datatypes
- collections.abc: Abstract Base Classes for Containers
- heapq: Heap queue algorithm
- bisect: Array bisection algorithm
- array: Efficient arrays of numeric values
- weakref: Weak references
- types: Dynamic type creation and names for built-in types
- copy: Shallow and deep copy operations
- pprint: Data pretty printer
- reprlib: Alternate repr() implementation
- enum: Support for enumerations
- graphlib: Functionality to operate with graph-like structures
Numeric and Mathematical Modules
- numbers: Numeric abstract base classes
- math: Mathematical functions
- cmath: Mathematical functions for complex numbers
- decimal: Decimal fixed-point and floating-point arithmetic
- fractions: Rational numbers
- random: Generate pseudo-random numbers
- statistics: Mathematical statistics functions
Functional Programming Modules
- itertools: Functions creating iterators for efficient looping
- functools: Higher-order functions and operations on callable objects
- operator: Standard operators as functions
File and Directory Access
- pathlib: Object-oriented filesystem paths
- os.path: Common pathname manipulations
- stat: Interpreting stat() results
- filecmp: File and Directory Comparisons
- tempfile: Generate temporary files and directories
- glob: Unix style pathname pattern expansion
- fnmatch: Unix filename pattern matching
- linecache: Random access to text lines
- shutil: High-level file operations
Data Persistence
- pickle: Python object serialization
- copyreg: Register pickle support functions
- shelve: Python object persistence
- marshal: Internal Python object serialization
- sqlite3: DB-API 2.0 interface for SQLite databases
Data Compression and Archiving
- zlib: Compression compatible with gzip
- gzip: Support for gzip files
- bz2: Support for bzip2 compression
- lzma: Compression using the LZMA algorithm
- zipfile: Work with ZIP archives
- tarfile: Read and write tar archive files
File Formats
- csv: CSV File Reading and Writing
- configparser: Configuration file parser
- tomllib: Parse TOML files
- netrc: netrc file processing
Cryptographic Services
- hashlib: Secure hashes and message digests
- hmac: Keyed-Hashing for Message Authentication
- secrets: Generate secure random numbers for managing secrets
Generic Operating System Services
- os: Miscellaneous operating system interfaces
- io: Core tools for working with streams
- time: Time access and conversions
- logging: Logging facility for Python
- logging.config: Logging configuration
- logging.handlers: Logging handlers
- platform: Access to underlying platform’s identifying data
- errno: Standard errno system symbols
- ctypes: A foreign function library for Python
Command Line Interface Libraries
- argparse: Parser for command-line options, arguments and subcommands
- optparse: Parser for command line options
- getpass: Portable password input
- fileinput: Iterate over lines from multiple input streams
Concurrent Execution
- threading: Thread-based parallelism
- multiprocessing: Process-based parallelism
- multiprocessing.shared_memory: Shared memory for direct access across processes
- concurrent.futures: Launching parallel tasks
- subprocess: Subprocess management
- sched: Event scheduler
- queue: A synchronized queue class
- contextvars: Context Variables
- _thread: Low-level threading API
Networking and Interprocess Communication
- asyncio: Asynchronous I/O
- socket: Low-level networking interface
- ssl: TLS/SSL wrapper for socket objects
Internet Data Handling
- email: An email and MIME handling package
- json: JSON encoder and decoder
- mailbox: Manipulate mailboxes in various formats
- mimetypes: Map filenames to MIME types
- base64: Base16, Base32, Base64, Base85 Data Encodings
- binascii: Convert between binary and ASCII
- quopri: Encode and decode MIME quoted-printable data
Structured Markup Processing Tools
- html: HyperText Markup Language support
- html.parser: Simple HTML and XHTML parser
- html.entities: Definitions of HTML general entities
- XML Processing Modules
- xml.etree.ElementTree: The ElementTree XML API
Internet Protocols and Support
- webbrowser: Convenient web-browser controller
- wsgiref: WSGI Utilities and Reference Implementation
- urllib: URL handling modules
- urllib.request: Extensible library for opening URLs
- urllib.response: Response classes used by urllib
- urllib.parse: Parse URLs into components
- urllib.error: Exception classes raised by urllib.request
- urllib.robotparser: Parser for robots.txt
- http: HTTP modules
- http.client: HTTP protocol client
- ftplib: FTP protocol client
- uuid: UUID objects according to RFC 4122
- socketserver: A framework for network servers
- http.server: HTTP servers
- http.cookies: HTTP state management
- http.cookiejar: Cookie handling for HTTP clients
- xmlrpc: XMLRPC server and client modules
- xmlrpc.client: XML-RPC client access
- xmlrpc.server: Basic XML-RPC servers
- ipaddress: IPv4/IPv6 manipulation library
Multimedia Services
Internationalization
Program Frameworks
Graphical User Interfaces with Tk
Development Tools
Debugging and Profiling
Software Packaging and Distribution
- ensurepip: Bootstrapping the pip installer
- venv: Creation of virtual environments
- zipapp: Manage executable Python zip archives
Python Runtime Services
- sys: System-specific parameters and functions
- sys.monitoring: Execution event monitoring
- sysconfig: Provide access to Python’s configuration information
- builtins: Built-in objects
- main: Top-level code environment
- warnings: Warning control
- dataclasses: Data Classes
- contextlib: Utilities for with-statement contexts
- abc: Abstract Base Classes
- atexit: Exit handlers
- traceback: Print or retrieve a stack traceback
- future: Future statement definitions
- gc: Garbage Collector interface
- inspect: Inspect live objects
- site: Site-specific configuration hook
Custom Python Interpreters
Importing Modules
- zipimport: Import modules from Zip archives
- pkgutil: Package extension utility
- modulefinder: Find modules used by a script
- runpy: Locating and executing Python modules
- importlib: The implementation of import
- importlib.resources: Package resource reading, opening and access
- importlib.resources.abc: Abstract base classes for resources
- importlib.metadata: Accessing package metadata
- The initialization of the sys.path module search path
Python Language Services
MS Windows Specific Services
Unix Specific Services
Modules command-line interface (CLI)
Environment/Dependency Management
uv
Installing
curl -LsSf https://astral.sh/uv/install.sh | sh
wget -qO- https://astral.sh/uv/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
pip install uv
winget install --id=astral-sh.uv -e
Testing and Code Analysis Tools
Templating
Database, Data Frames, and Data Modeling
3rd Party Libraries
Misc
UV Cheat Sheet
Check available package versions
pip index is not supported by the uv pip command. The workaround is running pip index via uvx:
uvx pip index versions <package>
Update a dependent package without adding it explicity
The froezn environment is built from the lockfile, so you an sync a dependency to its latest version:
uv sync --upgrade-package <package>
You may have to run uv sync --all-groups to re-add any git-based packages.
Version management
Building and publishing a package / updating versions
The –bump option supports the following common version components: major, minor, patch, stable, alpha, beta, rc, post, and dev.
Jupyter Lab/Notebook
uv run --with jupyter jupyter lab