VoiceGateway // DOCS
Configuration

voicegw.yaml reference

Every top-level section and key in the VoiceGateway config file. Validated with pydantic extra=forbid so typos fail fast at startup.

voicegw.yaml reference

The voicegw.yaml file is the central configuration for VoiceGateway. It is validated at startup using a Pydantic schema with extra="forbid", which means any typo or unknown key produces a clear error message before your gateway starts.

VoiceGateway searches for the config file in this order:

  1. ./voicegw.yaml (current directory)
  2. ~/.config/voicegateway/voicegw.yaml
  3. /etc/voicegateway/voicegw.yaml

You can override this with the VOICEGW_CONFIG environment variable. See Environment variables.

Top-level sections

The config file has thirteen top-level sections. All are optional.

SectionPurpose
providersAPI keys and settings for each provider
modelsRegister custom model aliases
stacksNamed bundles of STT + LLM + TTS models
projectsPer-project tracking and budgets
fallbacksOrdered fallback chains per modality
observabilityToggle latency, cost, and logging middleware
cost_trackingSQLite database settings for cost persistence
latencyTTFB warning thresholds and percentile config
rate_limitsPer-provider request rate limits
ingestRate limits for the fleet collector ingest endpoint
retentionAge-out policy for collector data
workersBackground rollup and retention cadence
serveBind host and port for the daemon

providers

Configure credentials and settings for each provider. Keys are provider names matching VoiceGateway's built-in provider identifiers.

voicegw.yaml
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}
  openai:
    api_key: ${OPENAI_API_KEY}
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
  groq:
    api_key: ${GROQ_API_KEY}
  cartesia:
    api_key: ${CARTESIA_API_KEY}
  elevenlabs:
    api_key: ${ELEVENLABS_API_KEY}
  assemblyai:
    api_key: ${ASSEMBLYAI_API_KEY}
  ollama:
    base_url: http://localhost:11434
  whisper:
    enabled: true
  kokoro:
    enabled: true
  piper:
    enabled: true

Each provider supports at minimum:

  • api_key (string): API key, typically via ${ENV_VAR} substitution.
  • base_url (string): override the default API endpoint.
  • enabled (bool, default true): disable a provider without removing its config.

See Providers for per-provider details.


models

Register custom model aliases organised by modality. Each entry maps an alias to a provider and model name, with optional defaults.

voicegw.yaml
models:
  stt:
    fast-transcription:
      provider: deepgram
      model: nova-3
    offline-transcription:
      provider: whisper
      model: large-v3
  llm:
    reasoning:
      provider: anthropic
      model: claude-sonnet-4-5
  tts:
    narrator:
      provider: cartesia
      model: sonic-3
      default_voice: narrator-male

See Models.


stacks

Named bundles that map to one STT, one LLM, and one TTS model. Use stacks to define preset quality / cost tiers.

YAML
stacks:
  premium:
    stt: deepgram/nova-3
    llm: anthropic/claude-sonnet-4-5
    tts: cartesia/sonic-3
  budget:
    stt: groq/whisper-large-v3
    llm: groq/llama-3.3-70b-versatile
    tts: local/piper:en_US-lessac-medium
  local:
    stt: local/whisper-large-v3
    llm: ollama/llama3.2:3b
    tts: local/kokoro

See Stacks.


projects

Define projects for cost attribution and budget enforcement. Each project can override providers per-key.

voicegw.yaml
projects:
  customer-support:
    name: Customer Support Bot
    description: Production support agent
    default_stack: premium
    daily_budget: 50.00
    budget_action: throttle
    tags: [prod, support]
    providers:
      deepgram:
        api_key: ${SUPPORT_DEEPGRAM_KEY}
      anthropic:
        api_key: ${SUPPORT_ANTHROPIC_KEY}
  internal-qa:
    name: Internal QA Bot
    description: Testing and QA agent
    default_stack: budget
    daily_budget: 10.00
    budget_action: warn
    tags: [dev, qa]

default_project: customer-support

budget_action is one of warn, throttle, or block. Project- scoped providers override the top-level providers for that project; otherwise the top-level keys apply.

See Projects.


fallbacks

Ordered lists of model ids per modality. Used as a resolver-time hint: walk the list at startup and pick the first model whose provider plugin imports cleanly.

voicegw.yaml
fallbacks:
  stt:
    - deepgram/nova-3
    - openai/whisper-1
    - local/whisper-large-v3
  llm:
    - anthropic/claude-sonnet-4-5
    - openai/gpt-4.1-mini
    - ollama/llama3.2:3b
  tts:
    - cartesia/sonic-3
    - elevenlabs/eleven_multilingual_v2
    - local/kokoro

observability

Three boolean flags that control which middleware runs. All default to true.

voicegw.yaml
observability:
  latency_tracking: true
  cost_tracking: true
  request_logging: true

See Observability.


cost_tracking

Configure the SQLite storage backend for cost persistence.

voicegw.yaml
cost_tracking:
  enabled: true
  db_path: ~/.config/voicegateway/voicegw.db
  daily_budget_alert: 100.00
  • enabled (bool, default false): enable cost persistence. Also enabled automatically if VOICEGW_DB_PATH is set.
  • db_path (string): path to the SQLite database file.
  • daily_budget_alert (float, optional): global daily budget alert threshold.

latency

Configure latency monitoring thresholds.

YAML
latency:
  ttfb_warning_ms: 500.0
  percentiles: [50.0, 95.0, 99.0]
  • ttfb_warning_ms (float, default 500.0): time-to-first-byte warning threshold in milliseconds.
  • percentiles (list of floats): which percentiles to track and report.

rate_limits

Per-provider rate limiting.

YAML
rate_limits:
  deepgram:
    requests_per_minute: 100
  openai:
    requests_per_minute: 60
  • requests_per_minute (int): maximum requests per minute for the given provider.

ingest

Rate limiting for the fleet collector ingest endpoint (POST /v1/ingest), where remote agents push telemetry. Limiting is a per-caller token bucket keyed by virtual key (then static API key, then client IP).

YAML
ingest:
  enabled: true
  requests_per_minute: 120
  burst: 240
  max_batch_size: 1000
  • enabled (bool, default true): turn ingest rate limiting on or off.
  • requests_per_minute (int, default 120): sustained per-caller request rate. Set to 0 to disable limiting (unlimited).
  • burst (int, default 240): token-bucket ceiling, the largest burst a caller can send before being throttled.
  • max_batch_size (int, default 1000): maximum records in one POST. A larger batch is rejected with 413 before any database write.

Over-limit requests get 429 with a Retry-After header (integer seconds). The library's remote sink honors Retry-After and retries without dropping the batch, so transient throttling never loses telemetry.


retention

Hard-delete aged rows from the collector database. A background worker prunes, per project, sessions and their dependent rows (replay, turns, dead-air, guardrail) by ended_at, and requests by timestamp, in batches.

YAML
retention:
  enabled: true
  default_days: 90
  • enabled (bool, default true): turn retention pruning on or off.
  • default_days (int, default 90): age after which a project's rows are deleted. Applies to every project that has data.

workers

Cadence for the collector's background workers: the latency and agent rollups, and the retention prune. Workers run in-process and are started by the server. In a multi-replica deployment, set enabled: false on every replica except the one chosen to run them (rollups and prunes are idempotent, but running them on every replica is wasteful).

YAML
workers:
  enabled: true
  rollup_interval_seconds: 900
  retention_interval_seconds: 3600
  • enabled (bool, default true): start the background workers. When false, no workers run (the rollup tables stay stale and retention does not prune).
  • rollup_interval_seconds (int, default 900): how often the latency and agent rollups refresh. The Agents dashboard list serves this 24h rollup.
  • retention_interval_seconds (int, default 3600): how often retention runs.

serve

Bind host and port for the daemon. The daemon serves the HTTP API (/v1/*), the dashboard API (/api/*), and the React SPA (/) all on this single port.

YAML
serve:
  host: 0.0.0.0
  port: 8080
  • host (string, default 0.0.0.0): bind address. Use 127.0.0.1 to restrict to localhost.
  • port (int, default 8080): port number. The wizard collects this as question 4 of voicegw onboard.

Environment variable substitution

Any string value in the config can use ${ENV_VAR} syntax. VoiceGateway substitutes these at load time using os.environ.

voicegw.yaml
providers:
  deepgram:
    api_key: ${DEEPGRAM_API_KEY}

If the environment variable is not set, it resolves to an empty string.

See Environment variables.

On this page