Budget Limiting
This page covers budget limiting configuration and features in the Radicalbit AI Gateway.
Overview
Budget limiting in the Radicalbit AI Gateway controls costs by setting a limit on the combined budget (input + output) consumed across all models in a route, helping to manage AI usage expenses.
A single time window tracks total spending — when the combined cost of input and output tokens crosses the configured threshold within the window, further requests are rejected until the window resets.
Budget Limiting Configuration
Basic Budget Limiting
Set a combined cost limit for input and output tokens:
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
input_cost_per_million_tokens: 5.0
output_cost_per_million_tokens: 15.0
routes:
production:
chat_models:
- gpt-4o
budget_limiting:
algorithm: fixed_window
window_size: 1 hour
max_budget: 50.0 # $50 per hour maximum combined cost
Advanced Budget Configuration
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
- model_id: gpt-3.5-turbo
model: openai/gpt-3.5-turbo
routes:
enterprise:
chat_models:
- gpt-4o
- gpt-3.5-turbo
budget_limiting:
algorithm: fixed_window
window_size: 1 minute
max_budget: 100
Budget Limiting Types
Cost-Based Budget Limiting
Control costs by setting a maximum budget (in dollars) per time window:
budget_limiting:
algorithm: fixed_window
window_size: 1 hour
max_budget: 50.0 # $50 per hour for combined input + output tokens
Note: Budget limiting uses cost calculations based on model pricing. You can set either max_token (token-based limits) or max_budget (cost-based limits), but not both in the same configuration.
Budget Limiting Behavior
When Limits Are Exceeded
- HTTP 429: Too Many Requests status code
- Budget Exceeded: Clear error message
- Retry Information: When budget resets
Response Headers (if exposed by the gateway)
HTTP/1.1 429 Too Many Requests
X-Budget-Limit: 100
X-Budget-Remaining: 0
X-Budget-Reset: 1640995200
Header names/availability may vary depending on the deployment and gateway version.
Best Practices
Budget Planning
- Start with conservative limits
- Monitor actual usage patterns
- Adjust based on business needs
- Consider peak usage times
Cost Management
- Use different limits for different routes (and therefore workloads)
- Ensure model costs are configured (manual or automatic price assignment)
- Regular budget reviews
- Cost optimization strategies
Error Handling
- Implement proper retry logic
- Handle budget exceeded gracefully
- Provide user feedback
- Consider fallback to cheaper models (with fallback configuration)
Troubleshooting
Common Issues
- Budget Too Low: Increase limits based on actual usage
- Unexpected Costs: Check model pricing and usage
- Reset Issues: Verify
window_sizeconfiguration
Next Steps
- Rate Limiting - Configure request limits
- Token Limiting - Set up token-based limits
- Monitoring - Set up observability and metrics
- API Reference - Complete API documentation