Config
the_data_packet.core.config
¶
Unified configuration system for The Data Packet.
This module provides centralized configuration management with support for: - Environment variable loading - Type-safe configuration with validation - Default values for all settings - Global configuration singleton pattern - Override capabilities for testing
The configuration system follows these priorities (highest to lowest): 1. Direct parameter overrides 2. Environment variables 3. Default values
Configuration Categories
API Keys: - Anthropic API key for Claude script generation - ElevenLabs API key for TTS audio generation - AWS credentials for S3 storage
Podcast Settings: - Show metadata (name, episode numbers) - Audio preferences (voices, sample rate) - RSS feed configuration
Processing Options: - Which generation steps to run - Article collection preferences - Output and cleanup settings
Network Settings: - HTTP timeouts and user agents - Retry configurations - Rate limiting settings
Usage
Get default configuration (loads from environment)¶
config = get_config()
Override specific values¶
config = get_config( show_name="My Custom Podcast", max_articles_per_source=3 )
Access configuration values¶
if config.anthropic_api_key: generator = ScriptGenerator(config.anthropic_api_key)
Environment Variables
Required for script generation: ANTHROPIC_API_KEY - Claude API key
Required for audio generation: GCS_BUCKET_NAME - Google Cloud Storage bucket for long audio synthesis GOOGLE_APPLICATION_CREDENTIALS - Path to service account JSON (optional if using default credentials)
Legacy (deprecated): ELEVENLABS_API_KEY - ElevenLabs API key (replaced by Google Cloud TTS)
Optional for S3 uploads: S3_BUCKET_NAME - S3 bucket for hosting AWS_ACCESS_KEY_ID - AWS access key AWS_SECRET_ACCESS_KEY - AWS secret key AWS_REGION - AWS region (default: us-east-1)
Optional for Grafana Loki log aggregation: GRAFANA_LOKI_URL - Loki endpoint URL GRAFANA_LOKI_USERNAME - Loki authentication username GRAFANA_LOKI_PASSWORD - Loki authentication password/API key
Optional customizations: SHOW_NAME - Podcast name override LOG_LEVEL - Logging level (DEBUG/INFO/WARNING/ERROR) MAX_ARTICLES - Max articles per source
Logging configuration: LOG_DIRECTORY - Directory for JSONL log files (default: output/logs) ENABLE_JSONL_LOGGING - Enable JSONL file logging (true/false, default: true) ENABLE_S3_LOG_UPLOAD - Enable S3 upload of logs (true/false, default: true) LOG_UPLOAD_INTERVAL - Upload interval in seconds (default: 3600) REMOVE_LOGS_AFTER_UPLOAD - Remove local logs after S3 upload (true/false, default: false)
Config
dataclass
¶
Unified configuration for The Data Packet with environment variable support.
This class provides type-safe configuration management with automatic environment variable loading and validation. All fields have sensible defaults and can be overridden via environment variables or direct parameter passing.
Attributes:
| Name | Type | Description |
|---|---|---|
API |
Keys
|
anthropic_api_key: Anthropic API key for Claude script generation. Required for script generation. Loaded from ANTHROPIC_API_KEY. elevenlabs_api_key: [DEPRECATED] ElevenLabs API key for legacy TTS. Replaced by Google Cloud TTS. Loaded from ELEVENLABS_API_KEY. mongodb_username: MongoDB username for episode tracking and article deduplication. Optional. Loaded from MONGODB_USERNAME. mongodb_password: MongoDB password for episode tracking and article deduplication. Optional. Loaded from MONGODB_PASSWORD. |
Google |
Cloud Configuration
|
google_credentials_path: Path to Google Cloud service account JSON file. Optional if using default application credentials. Loaded from GOOGLE_APPLICATION_CREDENTIALS. gcs_bucket_name: Google Cloud Storage bucket for long audio synthesis output. Required for audio generation. Loaded from GCS_BUCKET_NAME. |
AWS |
Configuration
|
aws_access_key_id: AWS access key for S3 uploads. Loaded from AWS_ACCESS_KEY_ID. aws_secret_access_key: AWS secret key for S3 uploads. Loaded from AWS_SECRET_ACCESS_KEY. aws_region: AWS region for S3 operations. Default: us-east-1. s3_bucket_name: S3 bucket name for hosting files. Loaded from S3_BUCKET_NAME. |
Grafana |
Loki Configuration
|
grafana_loki_url: Loki endpoint URL for log aggregation. Loaded from GRAFANA_LOKI_URL. grafana_loki_username: Username for Loki authentication. Loaded from GRAFANA_LOKI_USERNAME. grafana_loki_password: Password/API key for Loki authentication. Loaded from GRAFANA_LOKI_PASSWORD. |
Podcast |
Configuration
|
show_name: Podcast show name. Used in RSS feeds and file names. episode_number: Episode number for RSS feeds. Auto-generated if None. output_directory: Local directory for generated files. |
Article |
Collection
|
max_articles_per_source: Maximum articles to collect per source. article_sources: List of news sources to use (wired, techcrunch). article_categories: List of categories to fetch from each source. source_category_mapping: Maps each source to its supported categories. |
AI |
Generation Settings
|
claude_model: Claude model name for script generation. tts_model: Text-to-speech service type (now "google_cloud_tts"). max_tokens: Maximum tokens for Claude API calls. temperature: AI generation temperature (0.0-1.0, lower = more consistent). |
Audio |
Settings (Google Cloud Studio Multi-speaker voices
|
voice_a: First speaker voice name (Alex - male narrator). voice_b: Second speaker voice name (Sam - female narrator). audio_sample_rate: Audio sample rate in Hz. |
Processing |
Options
|
generate_script: Whether to generate podcast scripts. generate_audio: Whether to generate audio files. generate_rss: Whether to generate RSS feeds. save_intermediate_files: Whether to keep intermediate processing files. cleanup_temp_files: Whether to clean up temporary files after processing. |
RSS |
Feed Configuration
|
rss_channel_title: RSS channel title. rss_channel_description: RSS channel description. rss_channel_link: RSS channel website link. rss_channel_image_url: RSS channel artwork URL. rss_channel_email: Contact email for podcast. max_rss_episodes: Maximum episodes to keep in RSS feed. |
Network |
Settings
|
http_timeout: HTTP request timeout in seconds. user_agent: User agent string for HTTP requests. log_level: Logging level (DEBUG/INFO/WARNING/ERROR/CRITICAL). |
Example
Default configuration with environment variables¶
config = Config()
Custom configuration¶
config = Config( show_name="Tech News Daily", max_articles_per_source=3, voice_a="charon", voice_b="aoede" )
Validate before use¶
config.validate_for_script_generation() config.validate_for_audio_generation()
anthropic_api_key: Optional[str] = None
class-attribute
instance-attribute
¶
elevenlabs_api_key: Optional[str] = None
class-attribute
instance-attribute
¶
mongodb_username: Optional[str] = None
class-attribute
instance-attribute
¶
mongodb_password: Optional[str] = None
class-attribute
instance-attribute
¶
google_credentials_path: Optional[str] = None
class-attribute
instance-attribute
¶
gcp_secret_name: Optional[str] = None
class-attribute
instance-attribute
¶
gcs_bucket_name: Optional[str] = None
class-attribute
instance-attribute
¶
aws_access_key_id: Optional[str] = None
class-attribute
instance-attribute
¶
aws_secret_access_key: Optional[str] = None
class-attribute
instance-attribute
¶
aws_region: str = 'us-east-1'
class-attribute
instance-attribute
¶
s3_bucket_name: Optional[str] = None
class-attribute
instance-attribute
¶
grafana_loki_url: Optional[str] = None
class-attribute
instance-attribute
¶
grafana_loki_username: Optional[str] = None
class-attribute
instance-attribute
¶
grafana_loki_password: Optional[str] = None
class-attribute
instance-attribute
¶
show_name: str = 'The Data Packet'
class-attribute
instance-attribute
¶
episode_number: Optional[int] = None
class-attribute
instance-attribute
¶
output_directory: Path = Path('./output')
class-attribute
instance-attribute
¶
max_articles_per_source: int = 1
class-attribute
instance-attribute
¶
article_sources: List[str] = field(default_factory=(lambda: ['wired', 'techcrunch']))
class-attribute
instance-attribute
¶
article_categories: List[str] = field(default_factory=(lambda: ['security', 'ai']))
class-attribute
instance-attribute
¶
source_category_mapping: Dict[str, List[str]] = field(default_factory=(lambda: {'wired': ['security', 'science', 'ai'], 'techcrunch': ['ai', 'security']}))
class-attribute
instance-attribute
¶
claude_model: str = 'claude-sonnet-4-5-20250929'
class-attribute
instance-attribute
¶
tts_model: str = 'google_cloud_tts'
class-attribute
instance-attribute
¶
max_tokens: int = 3000
class-attribute
instance-attribute
¶
temperature: float = 0.7
class-attribute
instance-attribute
¶
male_voice: str = 'Puck'
class-attribute
instance-attribute
¶
female_voice: str = 'Kore'
class-attribute
instance-attribute
¶
audio_sample_rate: int = 24000
class-attribute
instance-attribute
¶
google_cloud_project: str = 'gen-lang-client-0429374219'
class-attribute
instance-attribute
¶
generate_script: bool = True
class-attribute
instance-attribute
¶
generate_audio: bool = True
class-attribute
instance-attribute
¶
generate_rss: bool = True
class-attribute
instance-attribute
¶
save_intermediate_files: bool = False
class-attribute
instance-attribute
¶
cleanup_temp_files: bool = True
class-attribute
instance-attribute
¶
rss_channel_title: Optional[str] = 'The Data Packet'
class-attribute
instance-attribute
¶
rss_channel_description: Optional[str] = None
class-attribute
instance-attribute
¶
rss_channel_link: Optional[str] = None
class-attribute
instance-attribute
¶
rss_channel_image_url: Optional[str] = 'https://the-data-packet.s3.us-west-2.amazonaws.com/the-data-packet/the_data_packet.png'
class-attribute
instance-attribute
¶
rss_channel_email: Optional[str] = 'contact@thewintershadow.com'
class-attribute
instance-attribute
¶
max_rss_episodes: int = 500
class-attribute
instance-attribute
¶
http_timeout: int = 30
class-attribute
instance-attribute
¶
user_agent: str = 'The Data Packet/1.0 (+https://github.com/TheWinterShadow/The-Data-Packet)'
class-attribute
instance-attribute
¶
log_level: str = 'INFO'
class-attribute
instance-attribute
¶
log_dir: str = 'output/logs'
class-attribute
instance-attribute
¶
enable_jsonl_logging: bool = True
class-attribute
instance-attribute
¶
enable_s3_log_upload: bool = True
class-attribute
instance-attribute
¶
log_upload_interval: int = 3600
class-attribute
instance-attribute
¶
remove_logs_after_upload: bool = False
class-attribute
instance-attribute
¶
__init__(anthropic_api_key: Optional[str] = None, elevenlabs_api_key: Optional[str] = None, mongodb_username: Optional[str] = None, mongodb_password: Optional[str] = None, google_credentials_path: Optional[str] = None, gcp_secret_name: Optional[str] = None, gcs_bucket_name: Optional[str] = None, aws_access_key_id: Optional[str] = None, aws_secret_access_key: Optional[str] = None, aws_region: str = 'us-east-1', s3_bucket_name: Optional[str] = None, grafana_loki_url: Optional[str] = None, grafana_loki_username: Optional[str] = None, grafana_loki_password: Optional[str] = None, show_name: str = 'The Data Packet', episode_number: Optional[int] = None, output_directory: Path = Path('./output'), max_articles_per_source: int = 1, article_sources: List[str] = (lambda: ['wired', 'techcrunch'])(), article_categories: List[str] = (lambda: ['security', 'ai'])(), source_category_mapping: Dict[str, List[str]] = (lambda: {'wired': ['security', 'science', 'ai'], 'techcrunch': ['ai', 'security']})(), claude_model: str = 'claude-sonnet-4-5-20250929', tts_model: str = 'google_cloud_tts', max_tokens: int = 3000, temperature: float = 0.7, male_voice: str = 'Puck', female_voice: str = 'Kore', audio_sample_rate: int = 24000, google_cloud_project: str = 'gen-lang-client-0429374219', generate_script: bool = True, generate_audio: bool = True, generate_rss: bool = True, save_intermediate_files: bool = False, cleanup_temp_files: bool = True, rss_channel_title: Optional[str] = 'The Data Packet', rss_channel_description: Optional[str] = None, rss_channel_link: Optional[str] = None, rss_channel_image_url: Optional[str] = 'https://the-data-packet.s3.us-west-2.amazonaws.com/the-data-packet/the_data_packet.png', rss_channel_email: Optional[str] = 'contact@thewintershadow.com', max_rss_episodes: int = 500, http_timeout: int = 30, user_agent: str = 'The Data Packet/1.0 (+https://github.com/TheWinterShadow/The-Data-Packet)', log_level: str = 'INFO', log_dir: str = 'output/logs', enable_jsonl_logging: bool = True, enable_s3_log_upload: bool = True, log_upload_interval: int = 3600, remove_logs_after_upload: bool = False) -> None
¶
__post_init__() -> None
¶
Load configuration from environment variables.
validate_for_script_generation() -> None
¶
Validate configuration for script generation.
validate_for_audio_generation() -> None
¶
Validate configuration for audio generation.
get_sources_for_category(category: str) -> List[str]
¶
Get list of sources that support a given category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
Category name to check |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of source names that support the category |
get_categories_for_source(source: str) -> List[str]
¶
Get list of categories supported by a given source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
Source name to check |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of category names supported by the source |
to_dict() -> Dict
¶
Convert configuration to dictionary.
get_config(**overrides: Any) -> Config
¶
Get the global configuration instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**overrides
|
Any
|
Configuration values to override |
{}
|
Returns:
| Type | Description |
|---|---|
Config
|
Config instance |
reset_config() -> None
¶
Reset the global configuration instance.