voicegw.yaml reference
Every top-level section and key in the VoiceGateway config file. Validated with pydantic extra=forbid so typos fail fast at startup.
voicegw.yaml reference
The voicegw.yaml file is the central configuration for
VoiceGateway. It is validated at startup using a Pydantic schema
with extra="forbid", which means any typo or unknown key produces
a clear error message before your gateway starts.
VoiceGateway searches for the config file in this order:
./voicegw.yaml(current directory)~/.config/voicegateway/voicegw.yaml/etc/voicegateway/voicegw.yaml
You can override this with the VOICEGW_CONFIG environment
variable. See Environment variables.
Top-level sections
The config file has thirteen top-level sections. All are optional.
| Section | Purpose |
|---|---|
providers | API keys and settings for each provider |
models | Register custom model aliases |
stacks | Named bundles of STT + LLM + TTS models |
projects | Per-project tracking and budgets |
fallbacks | Ordered fallback chains per modality |
observability | Toggle latency, cost, and logging middleware |
cost_tracking | SQLite database settings for cost persistence |
latency | TTFB warning thresholds and percentile config |
rate_limits | Per-provider request rate limits |
ingest | Rate limits for the fleet collector ingest endpoint |
retention | Age-out policy for collector data |
workers | Background rollup and retention cadence |
serve | Bind host and port for the daemon |
providers
Configure credentials and settings for each provider. Keys are provider names matching VoiceGateway's built-in provider identifiers.
providers:
deepgram:
api_key: ${DEEPGRAM_API_KEY}
openai:
api_key: ${OPENAI_API_KEY}
anthropic:
api_key: ${ANTHROPIC_API_KEY}
groq:
api_key: ${GROQ_API_KEY}
cartesia:
api_key: ${CARTESIA_API_KEY}
elevenlabs:
api_key: ${ELEVENLABS_API_KEY}
assemblyai:
api_key: ${ASSEMBLYAI_API_KEY}
ollama:
base_url: http://localhost:11434
whisper:
enabled: true
kokoro:
enabled: true
piper:
enabled: trueEach provider supports at minimum:
api_key(string): API key, typically via${ENV_VAR}substitution.base_url(string): override the default API endpoint.enabled(bool, defaulttrue): disable a provider without removing its config.
See Providers for per-provider details.
models
Register custom model aliases organised by modality. Each entry
maps an alias to a provider and model name, with optional
defaults.
models:
stt:
fast-transcription:
provider: deepgram
model: nova-3
offline-transcription:
provider: whisper
model: large-v3
llm:
reasoning:
provider: anthropic
model: claude-sonnet-4-5
tts:
narrator:
provider: cartesia
model: sonic-3
default_voice: narrator-maleSee Models.
stacks
Named bundles that map to one STT, one LLM, and one TTS model. Use stacks to define preset quality / cost tiers.
stacks:
premium:
stt: deepgram/nova-3
llm: anthropic/claude-sonnet-4-5
tts: cartesia/sonic-3
budget:
stt: groq/whisper-large-v3
llm: groq/llama-3.3-70b-versatile
tts: local/piper:en_US-lessac-medium
local:
stt: local/whisper-large-v3
llm: ollama/llama3.2:3b
tts: local/kokoroSee Stacks.
projects
Define projects for cost attribution and budget enforcement. Each project can override providers per-key.
projects:
customer-support:
name: Customer Support Bot
description: Production support agent
default_stack: premium
daily_budget: 50.00
budget_action: throttle
tags: [prod, support]
providers:
deepgram:
api_key: ${SUPPORT_DEEPGRAM_KEY}
anthropic:
api_key: ${SUPPORT_ANTHROPIC_KEY}
internal-qa:
name: Internal QA Bot
description: Testing and QA agent
default_stack: budget
daily_budget: 10.00
budget_action: warn
tags: [dev, qa]
default_project: customer-supportbudget_action is one of warn, throttle, or block. Project-
scoped providers override the top-level providers for that
project; otherwise the top-level keys apply.
See Projects.
fallbacks
Ordered lists of model ids per modality. Used as a resolver-time hint: walk the list at startup and pick the first model whose provider plugin imports cleanly.
fallbacks:
stt:
- deepgram/nova-3
- openai/whisper-1
- local/whisper-large-v3
llm:
- anthropic/claude-sonnet-4-5
- openai/gpt-4.1-mini
- ollama/llama3.2:3b
tts:
- cartesia/sonic-3
- elevenlabs/eleven_multilingual_v2
- local/kokoroobservability
Three boolean flags that control which middleware runs. All default
to true.
observability:
latency_tracking: true
cost_tracking: true
request_logging: trueSee Observability.
cost_tracking
Configure the SQLite storage backend for cost persistence.
cost_tracking:
enabled: true
db_path: ~/.config/voicegateway/voicegw.db
daily_budget_alert: 100.00enabled(bool, defaultfalse): enable cost persistence. Also enabled automatically ifVOICEGW_DB_PATHis set.db_path(string): path to the SQLite database file.daily_budget_alert(float, optional): global daily budget alert threshold.
latency
Configure latency monitoring thresholds.
latency:
ttfb_warning_ms: 500.0
percentiles: [50.0, 95.0, 99.0]ttfb_warning_ms(float, default500.0): time-to-first-byte warning threshold in milliseconds.percentiles(list of floats): which percentiles to track and report.
rate_limits
Per-provider rate limiting.
rate_limits:
deepgram:
requests_per_minute: 100
openai:
requests_per_minute: 60requests_per_minute(int): maximum requests per minute for the given provider.
ingest
Rate limiting for the fleet collector ingest endpoint (POST /v1/ingest),
where remote agents push telemetry. Limiting is a per-caller token bucket
keyed by virtual key (then static API key, then client IP).
ingest:
enabled: true
requests_per_minute: 120
burst: 240
max_batch_size: 1000enabled(bool, defaulttrue): turn ingest rate limiting on or off.requests_per_minute(int, default120): sustained per-caller request rate. Set to0to disable limiting (unlimited).burst(int, default240): token-bucket ceiling, the largest burst a caller can send before being throttled.max_batch_size(int, default1000): maximum records in one POST. A larger batch is rejected with413before any database write.
Over-limit requests get 429 with a Retry-After header (integer seconds).
The library's remote sink honors Retry-After and retries without dropping the
batch, so transient throttling never loses telemetry.
retention
Hard-delete aged rows from the collector database. A background worker prunes,
per project, sessions and their dependent rows (replay, turns, dead-air,
guardrail) by ended_at, and requests by timestamp, in batches.
retention:
enabled: true
default_days: 90enabled(bool, defaulttrue): turn retention pruning on or off.default_days(int, default90): age after which a project's rows are deleted. Applies to every project that has data.
workers
Cadence for the collector's background workers: the latency and agent rollups,
and the retention prune. Workers run in-process and are started by the server.
In a multi-replica deployment, set enabled: false on every replica except the
one chosen to run them (rollups and prunes are idempotent, but running them on
every replica is wasteful).
workers:
enabled: true
rollup_interval_seconds: 900
retention_interval_seconds: 3600enabled(bool, defaulttrue): start the background workers. Whenfalse, no workers run (the rollup tables stay stale and retention does not prune).rollup_interval_seconds(int, default900): how often the latency and agent rollups refresh. The Agents dashboard list serves this 24h rollup.retention_interval_seconds(int, default3600): how often retention runs.
serve
Bind host and port for the daemon. The daemon serves the HTTP API
(/v1/*), the dashboard API (/api/*), and the React SPA (/)
all on this single port.
serve:
host: 0.0.0.0
port: 8080host(string, default0.0.0.0): bind address. Use127.0.0.1to restrict to localhost.port(int, default8080): port number. The wizard collects this as question 4 ofvoicegw onboard.
Environment variable substitution
Any string value in the config can use ${ENV_VAR} syntax.
VoiceGateway substitutes these at load time using os.environ.
providers:
deepgram:
api_key: ${DEEPGRAM_API_KEY}If the environment variable is not set, it resolves to an empty string.
Stacks
Stacks are named YAML bundles that map a single name to one STT model, one LLM model, and one TTS model. They are a documentation and dashboard hint only: the `voicegateway.inference` module does not
Contributing to VoiceGateway
Thank you for your interest in contributing to VoiceGateway. This guide covers everything you need to get started, whether you are reporting a bug, suggesting a feature, or submitting code.