Model Configuration

This section provides comprehensive guidance on configuring AI models in the Radicalbit AI Gateway based on the actual codebase structure.

Overview

The Radicalbit AI Gateway supports multiple AI providers and models through a flexible configuration system.

With the new configuration structure:

Models are defined at top-level under:
- chat_models (for chat/completions)
- embedding_models (for embeddings)
Routes do not contain full model objects anymore.
- Routes reference models by model ID (string lists)

Model Structure

Basic Model Configuration (Chat)

chat_models:
  - model_id: openai-4o
    model: openai/gpt-4o
    credentials:
      api_key: !secret OPENAI_API_KEY
    params:
      temperature: 1
      max_tokens: 20

routes:
  production:
    chat_models:
      - openai-4o

Model Fields

Required Fields

model_id: Unique identifier for the model (used by routes and fallbacks)
model: Model identifier in format provider/model_name (e.g., openai/gpt-4o)

Optional Fields

credentials: API credentials for accessing the model
params: Model parameters (temperature, max_tokens, etc.)
retry_attempts: Number of retry attempts (default: 3)
prompt: Optional inline system/developer prompt (mutually exclusive with prompt_ref)
prompt_ref: Optional reference to a Markdown file containing the prompt
role: Role used when injecting prompt/prompt_ref (allowed: system or developer when prompt is set)
input_cost_per_million_tokens: Cost per million input tokens
output_cost_per_million_tokens: Cost per million output tokens

Supported Providers

OpenAI

chat_models:
  - model_id: gpt-4o
    model: openai/gpt-4o
    credentials:
      api_key: !secret OPENAI_API_KEY
    params:
      temperature: 0.7
      max_tokens: 1000

Ollama (Local Models / OpenAI-compatible)

chat_models:
  - model_id: llama3
    model: openai/llama3.2:3b
    credentials:
      base_url: "http://host.docker.internal:11434/v1"
    params:
      temperature: 0.7
      top_p: 0.9
    # Use either `prompt` OR `prompt_ref` (mutually exclusive)
    prompt_ref: "ollama_system.md"
    role: system

Google Gemini

chat_models:
  - model_id: gemini-pro
    model: google-genai/gemini-2.5-flash
    credentials:
      api_key: !secret GOOGLE_API_KEY
    params:
      temperature: 0.7
      max_output_tokens: 1024
    prompt: "You are a helpful assistant powered by Google Gemini."
    role: system

Important: The api_key is required for Gemini models.

Gemini Embedding Models:

embedding_models:
  - model_id: gemini-embedding
    model: google-genai/models/gemini-embedding-001
    credentials:
      api_key: !secret GOOGLE_API_KEY
    params:
      task_type: RETRIEVAL_QUERY  # Optional: RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING

Key differences:

Provider identifier: Use google-genai
API key requirement: api_key is mandatory
Model format for embeddings: Use models/gemini-embedding-001 (with models/ prefix)
Multimodal support: Gemini chat models support multimodal content (text, images, files)

Mock Models (Testing)

chat_models:
  - model_id: mock-chat
    model: mock/gateway
    params:
      latency_ms: 150
      response_text: "mocked response"

embedding_models:
  - model_id: mock-embed
    model: mock/embeddings
    params:
      latency_ms: 100
      vector_size: 8

Model Types

Chat Models

Used for conversational AI and text generation.

Definition:

chat_models:
  - model_id: assistant
    model: openai/gpt-4o
    credentials:
      api_key: !secret OPENAI_API_KEY
    params:
      temperature: 0.7
      max_tokens: 1000

Usage in routes:

routes:
  customer-service:
    chat_models:
      - assistant

Embedding Models

Used for embeddings and vector operations.

Definition:

embedding_models:
  - model_id: emb-small
    model: openai/text-embedding-3-small
    credentials:
      api_key: !secret OPENAI_API_KEY

Usage in routes:

routes:
  semantic-cache-demo:
    chat_models:
      - assistant
    embedding_models:
      - emb-small

Credentials Configuration

API Key Authentication

credentials:
  api_key: !secret OPENAI_API_KEY

Custom Base URL

credentials:
  base_url: "http://localhost:11434/v1"
  api_key: "dummy-api-key"  # May be required by some OpenAI-compatible servers/clients

Model Parameters

Common Parameters

temperature: Controls randomness (0.0-2.0)
max_tokens: Maximum tokens to generate
top_p: Nucleus sampling parameter
frequency_penalty: Penalty for frequent tokens
presence_penalty: Penalty for new tokens

Provider-Specific Parameters

Each provider may support additional parameters. Refer to the provider's documentation for complete details.

Cost Configuration

Automatic Cost Assignment

The gateway can assign costs from a price list (e.g. model_prices.json) if not explicitly configured:

chat_models:
  - model_id: gpt-4o
    model: openai/gpt-4o
    # Costs can be automatically assigned if supported

Manual Cost Configuration

chat_models:
  - model_id: custom-model
    model: openai/gpt-4o
    input_cost_per_million_tokens: 5.0
    output_cost_per_million_tokens: 15.0

Retry Configuration

Default Retry Policy

chat_models:
  - model_id: gpt-4o
    model: openai/gpt-4o
    retry_attempts: 3  # Default value

Custom Retry Policy

chat_models:
  - model_id: unreliable-model
    model: openai/gpt-3.5-turbo
    retry_attempts: 5

Prompts

Basic Prompt

chat_models:
  - model_id: assistant
    model: openai/gpt-4o
    prompt: "You are a helpful assistant."
    role: system

File-based Prompt (`prompt_ref`)

You can load the default prompt from a Markdown file mounted at runtime.

chat_models:
  - model_id: assistant
    model: openai/gpt-4o
    # Use either `prompt` OR `prompt_ref` (mutually exclusive)
    prompt_ref: "assistant.md"
    role: system

### Role-Based Prompt
```yaml
chat_models:
  - model_id: developer-assistant
    model: openai/gpt-4o
    prompt: "You are a senior software developer."
    role: developer

Model Validation

Unique Model IDs

Model IDs should be unique within each top-level section.

chat_models:
  - model_id: gpt-4o        # ✅ Unique
    model: openai/gpt-4o
  - model_id: gpt-3.5-turbo # ✅ Unique
    model: openai/gpt-3.5-turbo

Route References

Every ID listed in routes.<route>.chat_models must match a chat_models[].model_id
Every ID listed in routes.<route>.embedding_models must match an embedding_models[].model_id

Credential Validation

Many hosted providers require API keys
OpenAI-compatible servers may require a dummy api_key depending on the client/adapter

Best Practices

Model Organization

Use descriptive model_id names
Separate production and testing models
Keep prompts short and consistent across environments
Prefer prompt_ref for environment-specific prompts (mount different prompt folders per environment without changing config.yaml).

Cost Management

Configure cost information for accurate billing
Use automatic cost assignment when possible
Monitor token usage through metrics

Error Handling

Configure appropriate retry attempts
Use fallback models for critical routes
Monitor model availability

Troubleshooting

Common Issues

Model Not Found: Verify the model_id exists in chat_models / embedding_models and is correctly referenced by routes/fallbacks
Authentication Errors: Check API keys and credentials configuration
Prompt File Not Found: If using prompt_ref, ensure the Markdown file exists in the mounted PROMPTS_DIR inside the container and the filename matches the configured value.
Cost Assignment: Ensure model names match those in the price list file (if used)

Debug Configuration

chat_models:
  - model_id: debug-model
    model: openai/gpt-3.5-turbo
    retry_attempts: 1  # Reduce retries for faster debugging

routes:
  debug:
    chat_models:
      - debug-model

Next Steps

Fallback Configuration - Set up automatic failover
Advanced Configuration - Enterprise configuration options
API Reference - Complete API documentation

Overview​

Model Structure​

Basic Model Configuration (Chat)​

Model Fields​

Required Fields​

Optional Fields​

Supported Providers​

OpenAI​

Ollama (Local Models / OpenAI-compatible)​

Google Gemini​

Mock Models (Testing)​

Model Types​

Chat Models​

Embedding Models​

Credentials Configuration​

API Key Authentication​

Custom Base URL​

Model Parameters​

Common Parameters​

Provider-Specific Parameters​

Cost Configuration​

Automatic Cost Assignment​

Manual Cost Configuration​

Retry Configuration​

Default Retry Policy​

Custom Retry Policy​

Prompts​

Basic Prompt​

File-based Prompt (prompt_ref)​

Model Validation​

Unique Model IDs​

Route References​

Credential Validation​

Best Practices​

Model Organization​

Cost Management​

Error Handling​

Troubleshooting​

Common Issues​

Debug Configuration​

Next Steps​