Skip to content

Infrastructure

The Data Packet uses a hybrid cloud architecture: Google Cloud for audio synthesis, AWS for podcast hosting, and optional MongoDB for episode tracking.


Cloud components

graph TB
    subgraph GCP ["Google Cloud Platform"]
        TTS["Cloud TTS\nLong Audio Synthesis"]
        GCS["Cloud Storage\n30-day lifecycle"]
        TTS -->|writes audio| GCS
    end

    subgraph AWS
        S3["S3 Bucket\nPodcast hosting\n(public read)"]
    end

    subgraph Optional
        MongoDB["MongoDB\nEpisode tracking"]
        Loki["Grafana Loki\nLog aggregation"]
    end

    Pipeline -->|synthesize| TTS
    GCS -->|download| Pipeline
    Pipeline -->|upload| S3
    Pipeline -->|record episode| MongoDB
    Pipeline -->|forward logs| Loki

What you need to set up

  • Anthropic API key


    Required for script generation. Get one at console.anthropic.com.

    No infrastructure to provision.

  • GCS bucket + TTS API


    Required for audio synthesis. The Terraform setup provisions everything in one command.

  • AWS S3 bucket


    Optional but recommended for podcast distribution. Terraform provisions this alongside the GCP resources.

  • MongoDB


    Optional. Prevents article reuse across episodes. Use the included mongodb.sh script for a local Docker instance.


Provisioning with Terraform

The infra/ directory contains Terraform that provisions all required GCP and AWS resources in one apply. See the Terraform guide for step-by-step instructions.

One command provisions everything

cd infra/
terraform init && terraform apply

In this section

  • Docker — Docker deployment, compose, and production hardening
  • Terraform — GCP + AWS infrastructure provisioning
  • Logging — Structured JSONL logging, S3 upload, and Grafana Loki