Infrastructure¶

The Data Packet uses a hybrid cloud architecture: Google Cloud for audio synthesis, AWS for podcast hosting, and optional MongoDB for episode tracking.

Cloud components¶

graph TB
    subgraph GCP ["Google Cloud Platform"]
        TTS["Cloud TTS\nLong Audio Synthesis"]
        GCS["Cloud Storage\n30-day lifecycle"]
        TTS -->|writes audio| GCS
    end

    subgraph AWS
        S3["S3 Bucket\nPodcast hosting\n(public read)"]
    end

    subgraph Optional
        MongoDB["MongoDB\nEpisode tracking"]
        Loki["Grafana Loki\nLog aggregation"]
    end

    Pipeline -->|synthesize| TTS
    GCS -->|download| Pipeline
    Pipeline -->|upload| S3
    Pipeline -->|record episode| MongoDB
    Pipeline -->|forward logs| Loki

What you need to set up¶

Anthropic API key

Required for script generation. Get one at console.anthropic.com.

No infrastructure to provision.
GCS bucket + TTS API

Required for audio synthesis. The Terraform setup provisions everything in one command.
AWS S3 bucket

Optional but recommended for podcast distribution. Terraform provisions this alongside the GCP resources.
MongoDB

Optional. Prevents article reuse across episodes. Use the included mongodb.sh script for a local Docker instance.

Provisioning with Terraform¶

The infra/ directory contains Terraform that provisions all required GCP and AWS resources in one apply. See the Terraform guide for step-by-step instructions.

One command provisions everything

cd infra/
terraform init && terraform apply

In this section¶

Docker — Docker deployment, compose, and production hardening
Terraform — GCP + AWS infrastructure provisioning
Logging — Structured JSONL logging, S3 upload, and Grafana Loki