The Data Packet ¶
AI-powered automated podcast generation — transform tech news articles into engaging, multi-speaker podcast episodes with a single command.
-
Collect articles
Scrapes the latest tech news from Wired and TechCrunch via RSS feeds, across security, AI, and science categories — automatically, on every run.
-
Generate scripts
Uses Anthropic Claude to write natural, engaging two-host dialogue from the collected articles. No templates, no fill-in-the-blanks — real AI writing.
-
Produce audio
Synthesizes professional multi-speaker audio with Google Cloud TTS Long Audio Synthesis. Studio voices, 44.1 kHz, no timeouts, no length caps.
-
Distribute
Generates RSS feeds and uploads audio, script, and feed to AWS S3 for immediate podcast hosting. Live on Spotify and Apple Podcasts.
Quick start¶
docker pull ghcr.io/thewintershadow/the-data-packet:latest
docker run --rm \
-e ANTHROPIC_API_KEY="your-claude-key" \ # (1)!
-e GCS_BUCKET_NAME="your-gcs-bucket" \ # (2)!
-v "$(pwd)/output:/app/output" \
-v "$(pwd)/service-account-key.json:/credentials.json:ro" \
ghcr.io/thewintershadow/the-data-packet:latest
- Get a key at console.anthropic.com.
- GCS bucket for intermediate TTS audio. Provisioned by the Terraform setup.
from the_data_packet import PodcastPipeline, get_config
config = get_config(show_name="Tech Brief", max_articles_per_source=1)
pipeline = PodcastPipeline(config)
result = pipeline.run()
if result.success:
print(f"Audio: {result.audio_path}")
print(f"Script: {result.script_path}")
print(f"Articles collected: {result.number_of_articles_collected}")
Features at a glance¶
-
Docker-first deployment
Pull and run — no Python environment or system dependencies needed locally.
-
AI-written scripts
Claude writes natural two-host dialogue from real articles. Every episode is unique.
-
Professional audio
Google Cloud TTS Studio voices at 44.1 kHz. No length limits, no quality caps.
-
Smart deduplication
Optional MongoDB integration ensures each episode contains fresh, unseen articles.
-
One-step distribution
Audio, script, and RSS feed auto-uploaded to S3. Ready for podcast directories.
-
Full observability
Structured JSONL logging with optional S3 archival and Grafana Loki forwarding.
-
Production quality
Full mypy type coverage, 231+ tests, CI matrix across Python 3.10–3.13.
-
Multi-architecture
Docker images for
linux/amd64andlinux/arm64(Raspberry Pi).
Where to go next¶
-
Learn what The Data Packet is, how it works, and the design decisions behind it.
-
Run your first episode in 5 minutes with Docker or pip.
-
Every environment variable and CLI option, fully documented.
-
Docker, Terraform, cloud setup, and production logging.
-
Sphinx-generated code documentation for every module, class, and function.