Logging¶
The Data Packet uses structured JSONL logging: machine-readable entries written to daily files, with optional S3 archival and optional Grafana Loki forwarding.
Log format¶
Each entry is a JSON object on its own line (JSONL):
{
"timestamp": "2025-12-27T14:35:02.566269",
"level": "INFO",
"logger": "the_data_packet.sources.techcrunch",
"message": "Collected 5 new articles",
"module": "techcrunch",
"function": "collect_articles",
"line": 123,
"articles_found": 5,
"source": "techcrunch",
"collection_time": 1.2
}
Standard fields: timestamp · level · logger · message · module · function · line
Any extra={} kwargs in a log call become additional top-level JSON fields.
Configuration¶
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR / CRITICAL |
LOG_DIRECTORY |
output/logs |
Directory for JSONL files |
ENABLE_JSONL_LOGGING |
true |
Write JSONL files |
ENABLE_S3_LOG_UPLOAD |
true |
Upload completed daily files to S3 |
LOG_UPLOAD_INTERVAL |
3600 |
Seconds between upload checks |
REMOVE_LOGS_AFTER_UPLOAD |
false |
Delete local files after upload |
S3 upload structure¶
Log files are uploaded with a date-partitioned prefix — compatible with AWS Athena and other partition-aware tools:
Python usage¶
Log analysis¶
Grafana Loki¶
Configure these variables to forward all log entries to Loki:
GRAFANA_LOKI_URL=https://loki.your-instance.com
GRAFANA_LOKI_USERNAME=your-username
GRAFANA_LOKI_PASSWORD=your-password
All structured extra fields are forwarded as Loki labels, enabling LogQL queries like:
Best practices¶
Always use structured logging
Pass extra context as extra={} kwargs rather than interpolating into message strings.
This makes fields queryable and preserves clean message text.
Include timing data
Adding processing_time or duration_ms to log entries makes it trivial to identify
performance regressions in production without any additional instrumentation.