Skip to content

Logging

the_data_packet.core.logging

Centralized logging configuration for The Data Packet.

This module provides unified logging setup for the entire application. It configures structured logging with proper formatters, reduces noise from third-party libraries, and provides a consistent interface for obtaining logger instances throughout the codebase.

Features
  • Structured logging with timestamps and module names
  • Configurable log levels via environment variables
  • Noise reduction from third-party libraries
  • Consistent format across all modules
  • Console output optimized for Docker containers
Usage

In main application entry point

setup_logging()

In any module

from the_data_packet.core.logging import get_logger logger = get_logger(name) logger.info("Processing started")

Log Levels

DEBUG: Detailed debugging information INFO: General operational messages WARNING: Warning messages for recoverable issues ERROR: Error messages for serious problems CRITICAL: Critical errors that may cause shutdown

JSONLHandler

Custom logging handler that writes log entries to JSONL files.

Features: - Writes structured JSON logs to .jsonl files - Automatically rotates files daily - Includes metadata like timestamp, module, level - Thread-safe file operations

log_dir = Path(log_dir) instance-attribute

__init__(log_dir: str = 'output/logs')

emit(record: logging.LogRecord) -> None

Write log record as JSON line to daily log file.

S3LogUploader

Background service to upload JSONL log files to S3.

Features: - Monitors log directory for completed daily logs - Uploads files to S3 with structured naming - Optionally removes local files after upload - Runs in background thread

log_dir = Path(log_dir) instance-attribute

upload_interval = upload_interval instance-attribute

remove_after_upload = remove_after_upload instance-attribute

__init__(log_dir: str = 'output/logs', upload_interval: int = 3600, remove_after_upload: bool = False)

Initialize S3 log uploader.

Parameters:

Name Type Description Default
log_dir str

Directory containing JSONL log files

'output/logs'
upload_interval int

How often to check for files to upload (seconds)

3600
remove_after_upload bool

Whether to delete local files after upload

False

start() -> None

Start background upload service.

stop() -> None

Stop background upload service.

setup_logging(log_level: Optional[str] = None, enable_jsonl: Optional[bool] = None, enable_s3_upload: Optional[bool] = None, log_dir: Optional[str] = None) -> None

Configure application-wide logging settings.

Sets up structured logging with consistent formatting, configurable log levels, and noise reduction from third-party libraries. Should be called once at application startup.

Parameters:

Name Type Description Default
log_level Optional[str]

Override log level (DEBUG, INFO, WARNING, ERROR, CRITICAL). If None, uses configuration default. Case insensitive.

None
enable_jsonl Optional[bool]

Whether to enable JSONL file logging (default: from config)

None
enable_s3_upload Optional[bool]

Whether to enable S3 upload of log files (default: from config)

None
log_dir Optional[str]

Directory for JSONL log files (default: from config)

None
Example

Use default settings from config

setup_logging()

Override to DEBUG level, disable S3 upload

setup_logging("DEBUG", enable_s3_upload=False)

Console only (no JSONL files)

setup_logging(enable_jsonl=False, enable_s3_upload=False)

Note

This function uses force=True to override any existing logging configuration, ensuring consistent behavior in all environments. JSONL logs include structured metadata for log aggregation and analysis. S3 upload runs in background and uploads completed daily log files.

get_logger(name: str) -> logging.Logger

Get a named logger instance for a module.

This is the standard way to obtain logger instances throughout the application. Use name as the logger name to get hierarchical logger names that match the module structure.

Parameters:

Name Type Description Default
name str

Logger name, typically name from calling module

required

Returns:

Type Description
Logger

Configured logger instance ready for use

Example

Standard usage in any module

from the_data_packet.core.logging import get_logger logger = get_logger(name)

Usage examples

logger.info("Starting article collection") logger.warning("Article content is short: %d chars", len(content)) logger.error("Failed to generate script: %s", str(error))

With structured data (for log aggregation)

logger.info("Article processed", extra={ "article_id": article.id, "processing_time": elapsed_seconds })

Note

Logger names follow Python's hierarchical naming convention. For example, 'the_data_packet.sources.wired' will inherit configuration from 'the_data_packet.sources' and 'the_data_packet'.

stop_s3_uploader() -> None

Stop the S3 log uploader service gracefully.

Should be called during application shutdown to ensure any pending uploads complete properly.

upload_current_logs() -> None

Manually trigger upload of completed log files to S3.

Useful for testing or forcing immediate upload of logs. Only uploads files from previous days to avoid interfering with active log files.

upload_current_day_log(config: Config) -> None

Upload the current day's log file to S3.

This function specifically uploads today's log file, which is useful at the end of a pipeline run to ensure the current session's logs are archived alongside generated files.