Model Configuration
This section provides comprehensive guidance on configuring AI models in the Radicalbit AI Gateway based on the actual codebase structure.
Overview
The Radicalbit AI Gateway supports multiple AI providers and models through a flexible configuration system.
With the new configuration structure:
- Models are defined at top-level under:
chat_models(for chat/completions)embedding_models(for embeddings)
- Routes do not contain full model objects anymore.
- Routes reference models by model ID (string lists)
Model Structure
Basic Model Configuration (Chat)
chat_models:
- model_id: openai-4o
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 1
max_tokens: 20
routes:
production:
chat_models:
- openai-4o
Model Fields
Required Fields
model_id: Unique identifier for the model (used by routes and fallbacks)model: Model identifier in formatprovider/model_name(e.g.,openai/gpt-4o)
Optional Fields
credentials: API credentials for accessing the modelparams: Model parameters (temperature, max_tokens, etc.)retry_attempts: Number of retry attempts (default: 3)prompt: Optional inline system/developer prompt (mutually exclusive withprompt_ref)prompt_ref: Optional reference to a Markdown file containing the promptrole: Role used when injectingprompt/prompt_ref(allowed:systemordeveloperwhenpromptis set)input_cost_per_million_tokens: Cost per million input tokensoutput_cost_per_million_tokens: Cost per million output tokens
Supported Providers
OpenAI
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 0.7
max_tokens: 1000
Ollama (Local Models / OpenAI-compatible)
chat_models:
- model_id: llama3
model: openai/llama3.2:3b
credentials:
base_url: "http://host.docker.internal:11434/v1"
params:
temperature: 0.7
top_p: 0.9
# Use either `prompt` OR `prompt_ref` (mutually exclusive)
prompt_ref: "ollama_system.md"
role: system
Google Gemini
chat_models:
- model_id: gemini-pro
model: google-genai/gemini-2.5-flash
credentials:
api_key: !secret GOOGLE_API_KEY
params:
temperature: 0.7
max_output_tokens: 1024
prompt: "You are a helpful assistant powered by Google Gemini."
role: system
Important: The api_key is required for Gemini models.
Gemini Embedding Models:
embedding_models:
- model_id: gemini-embedding
model: google-genai/models/gemini-embedding-001
credentials:
api_key: !secret GOOGLE_API_KEY
params:
task_type: RETRIEVAL_QUERY # Optional: RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING
Key differences:
- Provider identifier: Use
google-genai - API key requirement:
api_keyis mandatory - Model format for embeddings: Use
models/gemini-embedding-001(withmodels/prefix) - Multimodal support: Gemini chat models support multimodal content (text, images, files)
Mock Models (Testing)
chat_models:
- model_id: mock-chat
model: mock/gateway
params:
latency_ms: 150
response_text: "mocked response"
embedding_models:
- model_id: mock-embed
model: mock/embeddings
params:
latency_ms: 100
vector_size: 8
Model Types
Chat Models
Used for conversational AI and text generation.
Definition:
chat_models:
- model_id: assistant
model: openai/gpt-4o
credentials:
api_key: !secret OPENAI_API_KEY
params:
temperature: 0.7
max_tokens: 1000
Usage in routes:
routes:
customer-service:
chat_models:
- assistant
Embedding Models
Used for embeddings and vector operations.
Definition:
embedding_models:
- model_id: emb-small
model: openai/text-embedding-3-small
credentials:
api_key: !secret OPENAI_API_KEY
Usage in routes:
routes:
semantic-cache-demo:
chat_models:
- assistant
embedding_models:
- emb-small
Credentials Configuration
API Key Authentication
credentials:
api_key: !secret OPENAI_API_KEY
Custom Base URL
credentials:
base_url: "http://localhost:11434/v1"
api_key: "dummy-api-key" # May be required by some OpenAI-compatible servers/clients
Model Parameters
Common Parameters
temperature: Controls randomness (0.0-2.0)max_tokens: Maximum tokens to generatetop_p: Nucleus sampling parameterfrequency_penalty: Penalty for frequent tokenspresence_penalty: Penalty for new tokens
Provider-Specific Parameters
Each provider may support additional parameters. Refer to the provider's documentation for complete details.
Cost Configuration
Automatic Cost Assignment
The gateway can assign costs from a price list (e.g. model_prices.json) if not explicitly configured:
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
# Costs can be automatically assigned if supported
Manual Cost Configuration
chat_models:
- model_id: custom-model
model: openai/gpt-4o
input_cost_per_million_tokens: 5.0
output_cost_per_million_tokens: 15.0
Retry Configuration
Default Retry Policy
chat_models:
- model_id: gpt-4o
model: openai/gpt-4o
retry_attempts: 3 # Default value
Custom Retry Policy
chat_models:
- model_id: unreliable-model
model: openai/gpt-3.5-turbo
retry_attempts: 5
Prompts
Basic Prompt
chat_models:
- model_id: assistant
model: openai/gpt-4o
prompt: "You are a helpful assistant."
role: system
File-based Prompt (prompt_ref)
You can load the default prompt from a Markdown file mounted at runtime.
chat_models:
- model_id: assistant
model: openai/gpt-4o
# Use either `prompt` OR `prompt_ref` (mutually exclusive)
prompt_ref: "assistant.md"
role: system
### Role-Based Prompt
```yaml
chat_models:
- model_id: developer-assistant
model: openai/gpt-4o
prompt: "You are a senior software developer."
role: developer
Model Validation
Unique Model IDs
Model IDs should be unique within each top-level section.
chat_models:
- model_id: gpt-4o # ✅ Unique
model: openai/gpt-4o
- model_id: gpt-3.5-turbo # ✅ Unique
model: openai/gpt-3.5-turbo
Route References
- Every ID listed in
routes.<route>.chat_modelsmust match achat_models[].model_id - Every ID listed in
routes.<route>.embedding_modelsmust match anembedding_models[].model_id
Credential Validation
- Many hosted providers require API keys
- OpenAI-compatible servers may require a dummy
api_keydepending on the client/adapter
Best Practices
Model Organization
- Use descriptive
model_idnames - Separate production and testing models
- Keep prompts short and consistent across environments
- Prefer
prompt_reffor environment-specific prompts (mount different prompt folders per environment without changingconfig.yaml).
Cost Management
- Configure cost information for accurate billing
- Use automatic cost assignment when possible
- Monitor token usage through metrics
Error Handling
- Configure appropriate retry attempts
- Use fallback models for critical routes
- Monitor model availability
Troubleshooting
Common Issues
- Model Not Found: Verify the
model_idexists inchat_models/embedding_modelsand is correctly referenced by routes/fallbacks - Authentication Errors: Check API keys and
credentialsconfiguration - Prompt File Not Found: If using
prompt_ref, ensure the Markdown file exists in the mountedPROMPTS_DIRinside the container and the filename matches the configured value. - Cost Assignment: Ensure model names match those in the price list file (if used)
Debug Configuration
chat_models:
- model_id: debug-model
model: openai/gpt-3.5-turbo
retry_attempts: 1 # Reduce retries for faster debugging
routes:
debug:
chat_models:
- debug-model
Next Steps
- Fallback Configuration - Set up automatic failover
- Advanced Configuration - Enterprise configuration options
- API Reference - Complete API documentation