Skip to content

Config

the_data_packet.core.config

Unified configuration system for The Data Packet.

This module provides centralized configuration management with support for: - Environment variable loading - Type-safe configuration with validation - Default values for all settings - Global configuration singleton pattern - Override capabilities for testing

The configuration system follows these priorities (highest to lowest): 1. Direct parameter overrides 2. Environment variables 3. Default values

Configuration Categories

API Keys: - Anthropic API key for Claude script generation - ElevenLabs API key for TTS audio generation - AWS credentials for S3 storage

Podcast Settings: - Show metadata (name, episode numbers) - Audio preferences (voices, sample rate) - RSS feed configuration

Processing Options: - Which generation steps to run - Article collection preferences - Output and cleanup settings

Network Settings: - HTTP timeouts and user agents - Retry configurations - Rate limiting settings

Usage

Get default configuration (loads from environment)

config = get_config()

Override specific values

config = get_config( show_name="My Custom Podcast", max_articles_per_source=3 )

Access configuration values

if config.anthropic_api_key: generator = ScriptGenerator(config.anthropic_api_key)

Environment Variables

Required for script generation: ANTHROPIC_API_KEY - Claude API key

Required for audio generation: GCS_BUCKET_NAME - Google Cloud Storage bucket for long audio synthesis GOOGLE_APPLICATION_CREDENTIALS - Path to service account JSON (optional if using default credentials)

Legacy (deprecated): ELEVENLABS_API_KEY - ElevenLabs API key (replaced by Google Cloud TTS)

Optional for S3 uploads: S3_BUCKET_NAME - S3 bucket for hosting AWS_ACCESS_KEY_ID - AWS access key AWS_SECRET_ACCESS_KEY - AWS secret key AWS_REGION - AWS region (default: us-east-1)

Optional for Grafana Loki log aggregation: GRAFANA_LOKI_URL - Loki endpoint URL GRAFANA_LOKI_USERNAME - Loki authentication username GRAFANA_LOKI_PASSWORD - Loki authentication password/API key

Optional customizations: SHOW_NAME - Podcast name override LOG_LEVEL - Logging level (DEBUG/INFO/WARNING/ERROR) MAX_ARTICLES - Max articles per source

Logging configuration: LOG_DIRECTORY - Directory for JSONL log files (default: output/logs) ENABLE_JSONL_LOGGING - Enable JSONL file logging (true/false, default: true) ENABLE_S3_LOG_UPLOAD - Enable S3 upload of logs (true/false, default: true) LOG_UPLOAD_INTERVAL - Upload interval in seconds (default: 3600) REMOVE_LOGS_AFTER_UPLOAD - Remove local logs after S3 upload (true/false, default: false)

Config dataclass

Unified configuration for The Data Packet with environment variable support.

This class provides type-safe configuration management with automatic environment variable loading and validation. All fields have sensible defaults and can be overridden via environment variables or direct parameter passing.

Attributes:

Name Type Description
API Keys

anthropic_api_key: Anthropic API key for Claude script generation. Required for script generation. Loaded from ANTHROPIC_API_KEY. elevenlabs_api_key: [DEPRECATED] ElevenLabs API key for legacy TTS. Replaced by Google Cloud TTS. Loaded from ELEVENLABS_API_KEY. mongodb_username: MongoDB username for episode tracking and article deduplication. Optional. Loaded from MONGODB_USERNAME. mongodb_password: MongoDB password for episode tracking and article deduplication. Optional. Loaded from MONGODB_PASSWORD.

Google Cloud Configuration

google_credentials_path: Path to Google Cloud service account JSON file. Optional if using default application credentials. Loaded from GOOGLE_APPLICATION_CREDENTIALS. gcs_bucket_name: Google Cloud Storage bucket for long audio synthesis output. Required for audio generation. Loaded from GCS_BUCKET_NAME.

AWS Configuration

aws_access_key_id: AWS access key for S3 uploads. Loaded from AWS_ACCESS_KEY_ID. aws_secret_access_key: AWS secret key for S3 uploads. Loaded from AWS_SECRET_ACCESS_KEY. aws_region: AWS region for S3 operations. Default: us-east-1. s3_bucket_name: S3 bucket name for hosting files. Loaded from S3_BUCKET_NAME.

Grafana Loki Configuration

grafana_loki_url: Loki endpoint URL for log aggregation. Loaded from GRAFANA_LOKI_URL. grafana_loki_username: Username for Loki authentication. Loaded from GRAFANA_LOKI_USERNAME. grafana_loki_password: Password/API key for Loki authentication. Loaded from GRAFANA_LOKI_PASSWORD.

Podcast Configuration

show_name: Podcast show name. Used in RSS feeds and file names. episode_number: Episode number for RSS feeds. Auto-generated if None. output_directory: Local directory for generated files.

Article Collection

max_articles_per_source: Maximum articles to collect per source. article_sources: List of news sources to use (wired, techcrunch). article_categories: List of categories to fetch from each source. source_category_mapping: Maps each source to its supported categories.

AI Generation Settings

claude_model: Claude model name for script generation. tts_model: Text-to-speech service type (now "google_cloud_tts"). max_tokens: Maximum tokens for Claude API calls. temperature: AI generation temperature (0.0-1.0, lower = more consistent).

Audio Settings (Google Cloud Studio Multi-speaker voices

voice_a: First speaker voice name (Alex - male narrator). voice_b: Second speaker voice name (Sam - female narrator). audio_sample_rate: Audio sample rate in Hz.

Processing Options

generate_script: Whether to generate podcast scripts. generate_audio: Whether to generate audio files. generate_rss: Whether to generate RSS feeds. save_intermediate_files: Whether to keep intermediate processing files. cleanup_temp_files: Whether to clean up temporary files after processing.

RSS Feed Configuration

rss_channel_title: RSS channel title. rss_channel_description: RSS channel description. rss_channel_link: RSS channel website link. rss_channel_image_url: RSS channel artwork URL. rss_channel_email: Contact email for podcast. max_rss_episodes: Maximum episodes to keep in RSS feed.

Network Settings

http_timeout: HTTP request timeout in seconds. user_agent: User agent string for HTTP requests. log_level: Logging level (DEBUG/INFO/WARNING/ERROR/CRITICAL).

Example

Default configuration with environment variables

config = Config()

Custom configuration

config = Config( show_name="Tech News Daily", max_articles_per_source=3, voice_a="charon", voice_b="aoede" )

Validate before use

config.validate_for_script_generation() config.validate_for_audio_generation()

anthropic_api_key: Optional[str] = None class-attribute instance-attribute

elevenlabs_api_key: Optional[str] = None class-attribute instance-attribute

mongodb_username: Optional[str] = None class-attribute instance-attribute

mongodb_password: Optional[str] = None class-attribute instance-attribute

google_credentials_path: Optional[str] = None class-attribute instance-attribute

gcp_secret_name: Optional[str] = None class-attribute instance-attribute

gcs_bucket_name: Optional[str] = None class-attribute instance-attribute

aws_access_key_id: Optional[str] = None class-attribute instance-attribute

aws_secret_access_key: Optional[str] = None class-attribute instance-attribute

aws_region: str = 'us-east-1' class-attribute instance-attribute

s3_bucket_name: Optional[str] = None class-attribute instance-attribute

grafana_loki_url: Optional[str] = None class-attribute instance-attribute

grafana_loki_username: Optional[str] = None class-attribute instance-attribute

grafana_loki_password: Optional[str] = None class-attribute instance-attribute

show_name: str = 'The Data Packet' class-attribute instance-attribute

episode_number: Optional[int] = None class-attribute instance-attribute

output_directory: Path = Path('./output') class-attribute instance-attribute

max_articles_per_source: int = 1 class-attribute instance-attribute

article_sources: List[str] = field(default_factory=(lambda: ['wired', 'techcrunch'])) class-attribute instance-attribute

article_categories: List[str] = field(default_factory=(lambda: ['security', 'ai'])) class-attribute instance-attribute

source_category_mapping: Dict[str, List[str]] = field(default_factory=(lambda: {'wired': ['security', 'science', 'ai'], 'techcrunch': ['ai', 'security']})) class-attribute instance-attribute

claude_model: str = 'claude-sonnet-4-5-20250929' class-attribute instance-attribute

tts_model: str = 'google_cloud_tts' class-attribute instance-attribute

max_tokens: int = 3000 class-attribute instance-attribute

temperature: float = 0.7 class-attribute instance-attribute

male_voice: str = 'Puck' class-attribute instance-attribute

female_voice: str = 'Kore' class-attribute instance-attribute

audio_sample_rate: int = 24000 class-attribute instance-attribute

google_cloud_project: str = 'gen-lang-client-0429374219' class-attribute instance-attribute

generate_script: bool = True class-attribute instance-attribute

generate_audio: bool = True class-attribute instance-attribute

generate_rss: bool = True class-attribute instance-attribute

save_intermediate_files: bool = False class-attribute instance-attribute

cleanup_temp_files: bool = True class-attribute instance-attribute

rss_channel_title: Optional[str] = 'The Data Packet' class-attribute instance-attribute

rss_channel_description: Optional[str] = None class-attribute instance-attribute

rss_channel_image_url: Optional[str] = 'https://the-data-packet.s3.us-west-2.amazonaws.com/the-data-packet/the_data_packet.png' class-attribute instance-attribute

rss_channel_email: Optional[str] = 'contact@thewintershadow.com' class-attribute instance-attribute

max_rss_episodes: int = 500 class-attribute instance-attribute

http_timeout: int = 30 class-attribute instance-attribute

user_agent: str = 'The Data Packet/1.0 (+https://github.com/TheWinterShadow/The-Data-Packet)' class-attribute instance-attribute

log_level: str = 'INFO' class-attribute instance-attribute

log_dir: str = 'output/logs' class-attribute instance-attribute

enable_jsonl_logging: bool = True class-attribute instance-attribute

enable_s3_log_upload: bool = True class-attribute instance-attribute

log_upload_interval: int = 3600 class-attribute instance-attribute

remove_logs_after_upload: bool = False class-attribute instance-attribute

__init__(anthropic_api_key: Optional[str] = None, elevenlabs_api_key: Optional[str] = None, mongodb_username: Optional[str] = None, mongodb_password: Optional[str] = None, google_credentials_path: Optional[str] = None, gcp_secret_name: Optional[str] = None, gcs_bucket_name: Optional[str] = None, aws_access_key_id: Optional[str] = None, aws_secret_access_key: Optional[str] = None, aws_region: str = 'us-east-1', s3_bucket_name: Optional[str] = None, grafana_loki_url: Optional[str] = None, grafana_loki_username: Optional[str] = None, grafana_loki_password: Optional[str] = None, show_name: str = 'The Data Packet', episode_number: Optional[int] = None, output_directory: Path = Path('./output'), max_articles_per_source: int = 1, article_sources: List[str] = (lambda: ['wired', 'techcrunch'])(), article_categories: List[str] = (lambda: ['security', 'ai'])(), source_category_mapping: Dict[str, List[str]] = (lambda: {'wired': ['security', 'science', 'ai'], 'techcrunch': ['ai', 'security']})(), claude_model: str = 'claude-sonnet-4-5-20250929', tts_model: str = 'google_cloud_tts', max_tokens: int = 3000, temperature: float = 0.7, male_voice: str = 'Puck', female_voice: str = 'Kore', audio_sample_rate: int = 24000, google_cloud_project: str = 'gen-lang-client-0429374219', generate_script: bool = True, generate_audio: bool = True, generate_rss: bool = True, save_intermediate_files: bool = False, cleanup_temp_files: bool = True, rss_channel_title: Optional[str] = 'The Data Packet', rss_channel_description: Optional[str] = None, rss_channel_link: Optional[str] = None, rss_channel_image_url: Optional[str] = 'https://the-data-packet.s3.us-west-2.amazonaws.com/the-data-packet/the_data_packet.png', rss_channel_email: Optional[str] = 'contact@thewintershadow.com', max_rss_episodes: int = 500, http_timeout: int = 30, user_agent: str = 'The Data Packet/1.0 (+https://github.com/TheWinterShadow/The-Data-Packet)', log_level: str = 'INFO', log_dir: str = 'output/logs', enable_jsonl_logging: bool = True, enable_s3_log_upload: bool = True, log_upload_interval: int = 3600, remove_logs_after_upload: bool = False) -> None

__post_init__() -> None

Load configuration from environment variables.

validate_for_script_generation() -> None

Validate configuration for script generation.

validate_for_audio_generation() -> None

Validate configuration for audio generation.

get_sources_for_category(category: str) -> List[str]

Get list of sources that support a given category.

Parameters:

Name Type Description Default
category str

Category name to check

required

Returns:

Type Description
List[str]

List of source names that support the category

get_categories_for_source(source: str) -> List[str]

Get list of categories supported by a given source.

Parameters:

Name Type Description Default
source str

Source name to check

required

Returns:

Type Description
List[str]

List of category names supported by the source

to_dict() -> Dict

Convert configuration to dictionary.

get_config(**overrides: Any) -> Config

Get the global configuration instance.

Parameters:

Name Type Description Default
**overrides Any

Configuration values to override

{}

Returns:

Type Description
Config

Config instance

reset_config() -> None

Reset the global configuration instance.