Token Limiting
This page covers token limiting configuration and features in the Radicalbit AI Gateway.
Overview
Token limiting controls the number of tokens consumed by requests within a specific time window, helping to manage costs and prevent abuse.
Configuration
Minimal route-level example:
routes:
production:
token_limiting:
input:
algorithm: fixed_window
window_size: 1 hour
max_token: 1000
output:
algorithm: fixed_window
window_size: 10 minutes
max_token: 500
Parameters:
algorithm: limiting algorithm (fixed_windoworaligned_fixed_window)window_size: time window for the limit (e.g.,10 seconds,1 minute,1 hour)max_token: maximum number of tokens allowed within the window
Storage Backend
Token limiting uses Redis (if configured) or in-memory storage:
- Redis: recommended for production with multiple gateway instances. Limits are shared across all instances.
- Memory: used when Redis is not configured. Limits are per-instance only.
Redis configuration example:
cache:
redis_host: "valkey"
redis_port: 6379
Even if the section is called
cache, Redis is typically reused as a shared storage backend for multiple gateway features (e.g., caching and limiting), depending on your gateway setup.
Error Handling
When a token limit is exceeded, the gateway returns an HTTP 429 (Too Many Requests) error.
Next Steps
- Rate Limiting - Configure rate-based limits
- Budget Limiting - Set up cost controls
- Monitoring - Set up observability and metrics
- API Reference - Complete API documentation