The Data Packet ¶

AI-powered automated podcast generation — transform tech news articles into engaging, multi-speaker podcast episodes with a single command.

Collect articles

Scrapes the latest tech news from Wired and TechCrunch via RSS feeds, across security, AI, and science categories — automatically, on every run.
Generate scripts

Uses Anthropic Claude to write natural, engaging two-host dialogue from the collected articles. No templates, no fill-in-the-blanks — real AI writing.
Produce audio

Synthesizes professional multi-speaker audio with Google Cloud TTS Long Audio Synthesis. Studio voices, 44.1 kHz, no timeouts, no length caps.
Distribute

Generates RSS feeds and uploads audio, script, and feed to AWS S3 for immediate podcast hosting. Live on Spotify and Apple Podcasts.

Quick start¶

Docker pip Python API

Pull and run

docker pull ghcr.io/thewintershadow/the-data-packet:latest

docker run --rm \
  -e ANTHROPIC_API_KEY="your-claude-key" \  # (1)!
  -e GCS_BUCKET_NAME="your-gcs-bucket" \    # (2)!
  -v "$(pwd)/output:/app/output" \
  -v "$(pwd)/service-account-key.json:/credentials.json:ro" \
  ghcr.io/thewintershadow/the-data-packet:latest

Get a key at console.anthropic.com.
GCS bucket for intermediate TTS audio. Provisioned by the Terraform setup.

pip install the-data-packet
the-data-packet --output ./episode

from the_data_packet import PodcastPipeline, get_config

config = get_config(show_name="Tech Brief", max_articles_per_source=1)
pipeline = PodcastPipeline(config)
result = pipeline.run()

if result.success:
    print(f"Audio:  {result.audio_path}")
    print(f"Script: {result.script_path}")
    print(f"Articles collected: {result.number_of_articles_collected}")

Features at a glance¶

Docker-first deployment

Pull and run — no Python environment or system dependencies needed locally.
AI-written scripts

Claude writes natural two-host dialogue from real articles. Every episode is unique.
Professional audio

Google Cloud TTS Studio voices at 44.1 kHz. No length limits, no quality caps.
Smart deduplication

Optional MongoDB integration ensures each episode contains fresh, unseen articles.
One-step distribution

Audio, script, and RSS feed auto-uploaded to S3. Ready for podcast directories.
Full observability

Structured JSONL logging with optional S3 archival and Grafana Loki forwarding.
Production quality

Full mypy type coverage, 231+ tests, CI matrix across Python 3.10–3.13.
Multi-architecture

Docker images for linux/amd64 and linux/arm64 (Raspberry Pi).

Where to go next¶

Overview

Learn what The Data Packet is, how it works, and the design decisions behind it.
Getting Started

Run your first episode in 5 minutes with Docker or pip.
Configuration

Every environment variable and CLI option, fully documented.
Infrastructure

Docker, Terraform, cloud setup, and production logging.
API Reference

Sphinx-generated code documentation for every module, class, and function.