Skip to content

The Data Packet

AI-powered automated podcast generation — transform tech news articles into engaging, multi-speaker podcast episodes with a single command.

CI Docker Python 3.9+ License: MIT Spotify Apple Podcasts


  • Collect articles


    Scrapes the latest tech news from Wired and TechCrunch via RSS feeds, across security, AI, and science categories — automatically, on every run.

  • Generate scripts


    Uses Anthropic Claude to write natural, engaging two-host dialogue from the collected articles. No templates, no fill-in-the-blanks — real AI writing.

  • Produce audio


    Synthesizes professional multi-speaker audio with Google Cloud TTS Long Audio Synthesis. Studio voices, 44.1 kHz, no timeouts, no length caps.

  • Distribute


    Generates RSS feeds and uploads audio, script, and feed to AWS S3 for immediate podcast hosting. Live on Spotify and Apple Podcasts.


Quick start

Pull and run
docker pull ghcr.io/thewintershadow/the-data-packet:latest

docker run --rm \
  -e ANTHROPIC_API_KEY="your-claude-key" \  # (1)!
  -e GCS_BUCKET_NAME="your-gcs-bucket" \    # (2)!
  -v "$(pwd)/output:/app/output" \
  -v "$(pwd)/service-account-key.json:/credentials.json:ro" \
  ghcr.io/thewintershadow/the-data-packet:latest
  1. Get a key at console.anthropic.com.
  2. GCS bucket for intermediate TTS audio. Provisioned by the Terraform setup.
pip install the-data-packet
the-data-packet --output ./episode
from the_data_packet import PodcastPipeline, get_config

config = get_config(show_name="Tech Brief", max_articles_per_source=1)
pipeline = PodcastPipeline(config)
result = pipeline.run()

if result.success:
    print(f"Audio:  {result.audio_path}")
    print(f"Script: {result.script_path}")
    print(f"Articles collected: {result.number_of_articles_collected}")

Features at a glance

  • Docker-first deployment


    Pull and run — no Python environment or system dependencies needed locally.

  • AI-written scripts


    Claude writes natural two-host dialogue from real articles. Every episode is unique.

  • Professional audio


    Google Cloud TTS Studio voices at 44.1 kHz. No length limits, no quality caps.

  • Smart deduplication


    Optional MongoDB integration ensures each episode contains fresh, unseen articles.

  • One-step distribution


    Audio, script, and RSS feed auto-uploaded to S3. Ready for podcast directories.

  • Full observability


    Structured JSONL logging with optional S3 archival and Grafana Loki forwarding.

  • Production quality


    Full mypy type coverage, 231+ tests, CI matrix across Python 3.10–3.13.

  • Multi-architecture


    Docker images for linux/amd64 and linux/arm64 (Raspberry Pi).


Where to go next

  • Overview

    Learn what The Data Packet is, how it works, and the design decisions behind it.

  • Getting Started

    Run your first episode in 5 minutes with Docker or pip.

  • Configuration

    Every environment variable and CLI option, fully documented.

  • Infrastructure

    Docker, Terraform, cloud setup, and production logging.

  • API Reference

    Sphinx-generated code documentation for every module, class, and function.