Usage¶
CLI reference¶
The the-data-packet command runs the full pipeline. All options can also be set via
environment variables — CLI flags override them.
API keys¶
| Option | Env var | Description |
|---|---|---|
--anthropic-key KEY |
ANTHROPIC_API_KEY |
Anthropic (Claude) API key |
--gcs-bucket NAME |
GCS_BUCKET_NAME |
GCS bucket for audio synthesis |
--google-credentials PATH |
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCP service account JSON |
--mongodb-username USER |
MONGODB_USERNAME |
MongoDB username (optional) |
--mongodb-password PASS |
MONGODB_PASSWORD |
MongoDB password (optional) |
--s3-bucket BUCKET |
S3_BUCKET_NAME |
S3 bucket for uploads (optional) |
Content¶
| Option | Default | Description |
|---|---|---|
--sources SOURCE ... |
wired |
Sources: wired, techcrunch |
--categories CAT ... |
security ai |
Categories to fetch |
--max-articles N |
1 |
Articles per source |
Generation¶
| Option | Description |
|---|---|
--script-only |
Generate script, skip audio |
--audio-only |
Generate audio from an existing script |
Audio voices¶
| Option | Default | Description |
|---|---|---|
--male-voice VOICE |
en-US-Studio-Q |
Google Cloud TTS voice for first speaker (Alex) |
--female-voice VOICE |
en-US-Studio-O |
Google Cloud TTS voice for second speaker (Sam) |
Available Studio voices
Only en-US-Studio-* voices support multi-speaker Long Audio Synthesis.
| Voice ID | Character |
|---|---|
en-US-Studio-Q |
Male — warm, professional |
en-US-Studio-O |
Female — clear, engaging |
en-US-Studio-M |
Male — authoritative |
Output¶
| Option | Default | Description |
|---|---|---|
--output DIR |
./output |
Output directory |
--show-name NAME |
The Data Packet |
Podcast show name |
--no-s3 |
— | Disable S3 uploads even if configured |
--save-intermediate |
— | Keep intermediate files |
--log-level LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
Common examples¶
Via Docker¶
All CLI options work identically when passed to the Docker image:
docker run --rm \
--env-file .env \
-v "$(pwd)/output:/app/output" \
-v "$(pwd)/service-account-key.json:/credentials.json:ro" \
ghcr.io/thewintershadow/the-data-packet:latest \
--show-name "Daily Tech Brief" \
--sources wired techcrunch \
--categories security ai \
--max-articles 2
Python API¶
from the_data_packet import PodcastPipeline, get_config
# With overrides
config = get_config(
show_name="Tech Brief",
max_articles_per_source=2,
article_sources=["wired", "techcrunch"],
article_categories=["security", "ai"],
male_voice="en-US-Studio-Q",
female_voice="en-US-Studio-O",
)
pipeline = PodcastPipeline(config)
result = pipeline.run()
if result.success:
print(f"Script: {result.script_path}")
print(f"Audio: {result.audio_path}")
print(f"Time: {result.execution_time_seconds:.1f}s")
print(f"Articles: {result.number_of_articles_collected}")
if result.s3_audio_url:
print(f"S3 URL: {result.s3_audio_url}")
else:
print(f"Failed: {result.error_message}")
Output files¶
| File | Description |
|---|---|
episode_script.txt |
Full two-host dialogue with Alex: / Sam: speaker labels |
episode.wav |
Synthesized multi-speaker audio at 44.1 kHz |
feed.xml |
RSS 2.0 podcast feed (only generated when S3 is configured) |
Automation¶
name: Generate Daily Podcast
on:
schedule:
- cron: '0 8 * * 1-5' # Weekdays at 8 AM UTC
jobs:
generate:
runs-on: ubuntu-latest
steps:
- name: Generate Podcast
run: |
docker run --rm \
-e ANTHROPIC_API_KEY="${{ secrets.ANTHROPIC_API_KEY }}" \
-e GCS_BUCKET_NAME="${{ secrets.GCS_BUCKET_NAME }}" \
-v "/tmp/output:/app/output" \
ghcr.io/thewintershadow/the-data-packet:latest
- name: Upload episode
uses: actions/upload-artifact@v4
with:
name: podcast-episode
path: /tmp/output/
apiVersion: batch/v1
kind: CronJob
metadata:
name: podcast-generator
spec:
schedule: "0 8 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: podcast-generator
image: ghcr.io/thewintershadow/the-data-packet:latest
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: podcast-secrets
key: anthropic-key
- name: GCS_BUCKET_NAME
valueFrom:
secretKeyRef:
name: podcast-secrets
key: gcs-bucket
restartPolicy: OnFailure
Troubleshooting¶
Permission denied on output directory
The container runs as UID 1000. Fix with:
API key not found
Verify your env file is being picked up: