Logging
the_data_packet.core.logging
¶
Centralized logging configuration for The Data Packet.
This module provides unified logging setup for the entire application. It configures structured logging with proper formatters, reduces noise from third-party libraries, and provides a consistent interface for obtaining logger instances throughout the codebase.
Features
- Structured logging with timestamps and module names
- Configurable log levels via environment variables
- Noise reduction from third-party libraries
- Consistent format across all modules
- Console output optimized for Docker containers
Usage
In main application entry point¶
setup_logging()
In any module¶
from the_data_packet.core.logging import get_logger logger = get_logger(name) logger.info("Processing started")
Log Levels
DEBUG: Detailed debugging information INFO: General operational messages WARNING: Warning messages for recoverable issues ERROR: Error messages for serious problems CRITICAL: Critical errors that may cause shutdown
JSONLHandler
¶
Custom logging handler that writes log entries to JSONL files.
Features: - Writes structured JSON logs to .jsonl files - Automatically rotates files daily - Includes metadata like timestamp, module, level - Thread-safe file operations
S3LogUploader
¶
Background service to upload JSONL log files to S3.
Features: - Monitors log directory for completed daily logs - Uploads files to S3 with structured naming - Optionally removes local files after upload - Runs in background thread
log_dir = Path(log_dir)
instance-attribute
¶
upload_interval = upload_interval
instance-attribute
¶
remove_after_upload = remove_after_upload
instance-attribute
¶
__init__(log_dir: str = 'output/logs', upload_interval: int = 3600, remove_after_upload: bool = False)
¶
Initialize S3 log uploader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_dir
|
str
|
Directory containing JSONL log files |
'output/logs'
|
upload_interval
|
int
|
How often to check for files to upload (seconds) |
3600
|
remove_after_upload
|
bool
|
Whether to delete local files after upload |
False
|
start() -> None
¶
Start background upload service.
stop() -> None
¶
Stop background upload service.
setup_logging(log_level: Optional[str] = None, enable_jsonl: Optional[bool] = None, enable_s3_upload: Optional[bool] = None, log_dir: Optional[str] = None) -> None
¶
Configure application-wide logging settings.
Sets up structured logging with consistent formatting, configurable log levels, and noise reduction from third-party libraries. Should be called once at application startup.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_level
|
Optional[str]
|
Override log level (DEBUG, INFO, WARNING, ERROR, CRITICAL). If None, uses configuration default. Case insensitive. |
None
|
enable_jsonl
|
Optional[bool]
|
Whether to enable JSONL file logging (default: from config) |
None
|
enable_s3_upload
|
Optional[bool]
|
Whether to enable S3 upload of log files (default: from config) |
None
|
log_dir
|
Optional[str]
|
Directory for JSONL log files (default: from config) |
None
|
Example
Use default settings from config¶
setup_logging()
Override to DEBUG level, disable S3 upload¶
setup_logging("DEBUG", enable_s3_upload=False)
Console only (no JSONL files)¶
setup_logging(enable_jsonl=False, enable_s3_upload=False)
Note
This function uses force=True to override any existing logging configuration, ensuring consistent behavior in all environments. JSONL logs include structured metadata for log aggregation and analysis. S3 upload runs in background and uploads completed daily log files.
get_logger(name: str) -> logging.Logger
¶
Get a named logger instance for a module.
This is the standard way to obtain logger instances throughout the application. Use name as the logger name to get hierarchical logger names that match the module structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Logger name, typically name from calling module |
required |
Returns:
| Type | Description |
|---|---|
Logger
|
Configured logger instance ready for use |
Example
Standard usage in any module¶
from the_data_packet.core.logging import get_logger logger = get_logger(name)
Usage examples¶
logger.info("Starting article collection") logger.warning("Article content is short: %d chars", len(content)) logger.error("Failed to generate script: %s", str(error))
With structured data (for log aggregation)¶
logger.info("Article processed", extra={ "article_id": article.id, "processing_time": elapsed_seconds })
Note
Logger names follow Python's hierarchical naming convention. For example, 'the_data_packet.sources.wired' will inherit configuration from 'the_data_packet.sources' and 'the_data_packet'.
stop_s3_uploader() -> None
¶
Stop the S3 log uploader service gracefully.
Should be called during application shutdown to ensure any pending uploads complete properly.
upload_current_logs() -> None
¶
Manually trigger upload of completed log files to S3.
Useful for testing or forcing immediate upload of logs. Only uploads files from previous days to avoid interfering with active log files.
upload_current_day_log(config: Config) -> None
¶
Upload the current day's log file to S3.
This function specifically uploads today's log file, which is useful at the end of a pipeline run to ensure the current session's logs are archived alongside generated files.