Model Configuration
This section provides comprehensive guidance on configuring AI models in the Radicalbit AI Gateway based on the actual codebase structure.
Overview
The Radicalbit AI Gateway supports multiple AI providers and models through a flexible configuration system. Models are defined within routes and can be configured with various parameters including credentials, retry policies, and cost information.
Model Structure
Basic Model Configuration
routes:
production:
chat_models:
- model_id: openai-4o
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 1
max_tokens: 20
Model Fields
Required Fields
model_id: Unique identifier for the model within the routemodel: Model identifier in formatprovider/model_name(e.g.,openai/gpt-4o)
Optional Fields
credentials: API credentials for accessing the modelparams: Model parameters (temperature, max_tokens, etc.)retry_attempts: Number of retry attempts (default: 3)prompt: System prompt for the modelrole: Role for the prompt (developer, user, system, assistant)input_cost_per_million_tokens: Cost per million input tokensoutput_cost_per_million_tokens: Cost per million output tokens
Supported Providers
OpenAI
- model_id: gpt-4o
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 0.7
max_tokens: 1000
Ollama (Local Models)
- model_id: llama3
model: openai/llama3.2:3b
credentials:
base_url: 'http://host.docker.internal:11434/v1'
params:
temperature: 0.7
top_p: 0.9
prompt: 'Do what the user asks without thinking.'
Google Gemini
- model_id: gemini-pro
model: google-genai/gemini-1.5-pro
credentials:
api_key: !secret GOOGLE_API_KEY
params:
temperature: 0.7
max_output_tokens: 1024
Important: The api_key is required for Gemini models. Unlike some other providers, Gemini does not support fallback to environment variables.
Gemini Embedding Models:
- model_id: gemini-embedding
model: google-genai/models/gemini-embedding-001
credentials:
api_key: !secret GOOGLE_API_KEY
params:
task_type: RETRIEVAL_QUERY # Optional: RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING
Key differences:
- Provider identifier: Use
google-genai(consistent with other providers) - API key requirement:
api_keyis mandatory (no environment variable fallback) - Model format for chat: Use model names like
gemini-1.5-pro,gemini-1.5-flash,gemini-pro - Model format for embeddings: Use
models/gemini-embedding-001(withmodels/prefix) - Multimodal support: Gemini models support multimodal content (text, images, files)
Mock Models (Testing)
- model_id: mock-chat
model: mock/gateway
params:
latency_ms: 150
response_text: "mocked response"
Model Types
Chat Models
Used for conversational AI and text generation:
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 0.7
max_tokens: 1000
Embedding Models
Used for text embeddings and vector operations:
embedding_models:
- model_id: mock-embed
model: mock/embeddings
params:
latency_ms: 100
vector_size: 8
Credentials Configuration
API Key Authentication
credentials:
api_key: !secret OPENAI_API_KEY
Custom Base URL
credentials:
base_url: 'http://localhost:11434/v1'
api_key: 'dummy-api-key' # Required for OpenAI-compatible APIs
Model Parameters
Common Parameters
temperature: Controls randomness (0.0-2.0)max_tokens: Maximum tokens to generatetop_p: Nucleus sampling parameterfrequency_penalty: Penalty for frequent tokenspresence_penalty: Penalty for new tokens
Provider-Specific Parameters
Each provider may support additional parameters. Refer to the provider's documentation for complete details.
Cost Configuration
Automatic Cost Assignment
The gateway automatically assigns costs from model_prices.json if not explicitly configured:
- model_id: gpt-4o
model: openai/gpt-4o
# Costs will be automatically assigned from model_prices.json
Manual Cost Configuration
- model_id: custom-model
model: openai/gpt-4o
input_cost_per_million_tokens: 5.0
output_cost_per_million_tokens: 15.0
Retry Configuration
Default Retry Policy
- model_id: gpt-4o
model: openai/gpt-4o
retry_attempts: 3 # Default value
Custom Retry Policy
- model_id: unreliable-model
model: openai/gpt-3.5-turbo
retry_attempts: 5
System Prompts
Basic System Prompt
- model_id: assistant
model: openai/gpt-4o
prompt: "You are a helpful assistant."
role: "system"
Role-Based Prompts
- model_id: developer-assistant
model: openai/gpt-4o
prompt: "You are a senior software developer."
role: "developer"
Model Validation
Unique Model IDs
All models within a route must have unique model_id values:
chat_models:
- model_id: gpt-4o # ✅ Unique
model: openai/gpt-4o
- model_id: gpt-3.5-turbo # ✅ Unique
model: openai/gpt-3.5-turbo
Credential Validation
- OpenAI models require API keys when using standard endpoints
- Custom base URLs can use dummy API keys for OpenAI-compatible services
Best Practices
Model Organization
- Use descriptive
model_idnames - Group related models in the same route
- Separate production and testing models
Cost Management
- Configure cost information for accurate billing
- Use automatic cost assignment when possible
- Monitor token usage through metrics
Error Handling
- Configure appropriate retry attempts
- Use fallback models for critical routes
- Monitor model availability
Troubleshooting
Common Issues
- Model Not Found: Verify
model_idis unique and correctly referenced - Authentication Errors: Check API keys and credentials configuration
- Cost Assignment: Ensure model names match those in
model_prices.json
Debug Configuration
- model_id: debug-model
model: openai/gpt-3.5-turbo
retry_attempts: 1 # Reduce retries for faster debugging
Next Steps
- Fallback Configuration - Set up automatic failover
- Advanced Configuration - Enterprise configuration options
- API Reference - Complete API documentation