Fallback
Fallback provides automatic failover when a primary model fails or becomes unavailable. The gateway tries each fallback model in order until one succeeds or all are exhausted. All fallback models must be declared in the route's model list alongside the target.
Configuration
routes:
production:
chat_models:
- gpt-4o
- gpt-4o-mini
- claude-3-sonnet
fallback:
- target: gpt-4o # primary model_id
fallbacks:
- gpt-4o-mini # tried first
- claude-3-sonnet # tried second
Fields
target: Themodel_idof the primary model.fallbacks: List ofmodel_ids to try in order iftargetfails.type: Useembeddingfor embedding fallbacks. Omit for chat (default).
Embedding Fallback
routes:
production:
embedding_models:
- openai-embedding
- local-embedding
fallback:
- target: openai-embedding
fallbacks:
- local-embedding
type: embedding
Mixed Chat + Embedding
routes:
production:
chat_models:
- gpt-4o
- gpt-4o-mini
embedding_models:
- openai-embedding
- local-embedding
fallback:
- target: gpt-4o
fallbacks:
- gpt-4o-mini
- target: openai-embedding
fallbacks:
- local-embedding
type: embedding
Validation Rules
targetand allfallbacksmust be listed in the route'schat_models(orembedding_modelsfor embedding fallbacks)- Chat fallbacks can only reference chat models, embedding fallbacks only embedding models
- The gateway rejects configurations that violate these rules at startup
Monitoring
Fallback activations are tracked via the gateway_fallbacks_triggered_total metric with labels route_name, target, and fallback. See Monitoring.
Next Steps
- Intelligent Routing — Proactively route to cheaper models before failures occur
- Model Configuration — Configure retry attempts per model
- Advanced Configuration — Full configuration reference