Skip to main content

Token Limiting

This page covers token limiting configuration and features in the Radicalbit AI Gateway.

Overview

Token limiting controls the number of tokens consumed by requests within a specific time window, helping to manage costs and prevent abuse.


Configuration

Minimal route-level example:

routes:
production:
token_limiting:
input:
algorithm: fixed_window
window_size: 1 hour
max_token: 1000
output:
algorithm: fixed_window
window_size: 10 minutes
max_token: 500

Parameters:

  • algorithm: limiting algorithm (fixed_window or aligned_fixed_window)
  • window_size: time window for the limit (e.g., 10 seconds, 1 minute, 1 hour)
  • max_token: maximum number of tokens allowed within the window

Storage Backend

Token limiting uses Redis (if configured) or in-memory storage:

  • Redis: recommended for production with multiple gateway instances. Limits are shared across all instances.
  • Memory: used when Redis is not configured. Limits are per-instance only.

Redis configuration example:

cache:
redis_host: "valkey"
redis_port: 6379

Even if the section is called cache, Redis is typically reused as a shared storage backend for multiple gateway features (e.g., caching and limiting), depending on your gateway setup.


Error Handling

When a token limit is exceeded, the gateway returns an HTTP 429 (Too Many Requests) error.


Next Steps