Infrastructure¶
The Data Packet uses a hybrid cloud architecture: Google Cloud for audio synthesis, AWS for podcast hosting, and optional MongoDB for episode tracking.
Cloud components¶
graph TB
subgraph GCP ["Google Cloud Platform"]
TTS["Cloud TTS\nLong Audio Synthesis"]
GCS["Cloud Storage\n30-day lifecycle"]
TTS -->|writes audio| GCS
end
subgraph AWS
S3["S3 Bucket\nPodcast hosting\n(public read)"]
end
subgraph Optional
MongoDB["MongoDB\nEpisode tracking"]
Loki["Grafana Loki\nLog aggregation"]
end
Pipeline -->|synthesize| TTS
GCS -->|download| Pipeline
Pipeline -->|upload| S3
Pipeline -->|record episode| MongoDB
Pipeline -->|forward logs| Loki
What you need to set up¶
-
Anthropic API key
Required for script generation. Get one at console.anthropic.com.
No infrastructure to provision.
-
GCS bucket + TTS API
Required for audio synthesis. The Terraform setup provisions everything in one command.
-
AWS S3 bucket
Optional but recommended for podcast distribution. Terraform provisions this alongside the GCP resources.
-
MongoDB
Optional. Prevents article reuse across episodes. Use the included
mongodb.shscript for a local Docker instance.
Provisioning with Terraform¶
The infra/ directory contains Terraform that provisions all required GCP and AWS
resources in one apply. See the Terraform guide for step-by-step
instructions.