Guardrails Reference
Guardrails are rules applied to either the user's input, the model's output, or both (io) to enforce content policies.
Guardrail Classes
Check Guardrails
These guardrails inspect content. If triggered, they can either:
BLOCKthe request (raising an error)SOFT_BLOCKthe request (returning the response with the error message inside the content)WARN(logging a message and letting the request proceed)
Redact Guardrails
These guardrails modify the content by removing or masking sensitive information.
Configuration
name: A unique, descriptive name for the guardrail.type: The type of check to perform.where: Where to apply the guardrail (input,output, oriofor both).behavior: The action to take if the guardrail is triggered.- For
Checktypes:block,soft_blockorwarn. Redacttypes do not need a behavior; they always redact.
- For
parameters: A dictionary of settings specific to the guardrail type.
Guardrail Types & Parameters
| Type | Description | parameters Example |
|---|---|---|
starts_with | Checks if the text starts with a specific string. | values: [ "Hello", "Hi" ] |
ends_with | Checks if the text ends with a specific string. | values: [ "?", "!" ] |
contains | Checks if the text contains a specific substring. | values: [ "forbidden_word" ] |
regex | Checks if the text matches a regular expression. | values: [ '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' ] |
presidio_analyzer | A powerful Check guardrail that uses Microsoft Presidio to detect Personally Identifiable Information (PII). | language: itentities: [ "EMAIL_ADDRESS", "IBAN_CODE" ] |
presidio_anonymizer | A powerful Redact guardrail that uses Microsoft Presidio to find and mask PII (e.g., replacing an email with <EMAIL_ADDRESS>). | language: itentities: [ "EMAIL_ADDRESS", "IBAN_CODE", "IT_IDENTITY_CARD" ] |
Traditional Guardrails
Starts With
Checks if text starts with specific strings.
guardrails:
- name: greeting_check
type: starts_with
where: input
behavior: warn
parameters:
values: ["Hello", "Hi", "Good morning"]
Ends With
Checks if text ends with specific strings.
guardrails:
- name: question_check
type: ends_with
where: input
behavior: soft_block
parameters:
values: ["?", "??", "???"]
Contains
Checks if text contains specific substrings.
guardrails:
- name: profanity_filter
type: contains
where: input
behavior: block
parameters:
values: ["inappropriate", "offensive", "spam"]
response_message: "Content blocked due to inappropriate language"
Regex
Checks text against regular expression patterns.
guardrails:
- name: email_detector
type: regex
where: input
behavior: block
parameters:
values: ['\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b']
response_message: "Email addresses are not allowed"
Presidio Guardrails
Presidio Analyzer
Detects PII entities in content.
guardrails:
- name: pii_detector
type: presidio_analyzer
where: input
behavior: block
parameters:
language: en
entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
response_message: "Personal information detected and blocked"
Presidio Anonymizer
Anonymizes PII entities by replacing them with placeholders.
guardrails:
- name: pii_anonymizer
type: presidio_anonymizer
where: io
behavior: warn
parameters:
language: en
entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"]
LLM Judge Guardrails
Judge
In-depth evaluation for critical decisions.
guardrails:
- name: toxicity_judge
type: judge
where: input
behavior: block
response_message: "🚨 Toxic content detected and blocked"
parameters:
prompt_ref: "toxicity_check.md"
model_id: "gpt-4o-mini"
temperature: 0.0
max_tokens: 100
Best Practices
1. Layer Defense
Use multiple guardrail types together for comprehensive protection:
guardrails:
- name: basic_filter
type: contains
where: input
behavior: warn
parameters:
values: ["spam", "scam"]
- name: pii_detector
type: presidio_analyzer
where: input
behavior: block
parameters:
language: en
entities: ["EMAIL_ADDRESS", "PHONE_NUMBER"]
- name: toxicity_judge
type: judge
where: input
behavior: block
parameters:
prompt_ref: "toxicity_check.md"
model_id: "gpt-4o-mini"
2. Appropriate Behavior Selection
block: For critical violations (PII, toxic content)soft_block: For policy violations that need user feedbackwarn: For monitoring and logging purposes
3. Performance Considerations
- Use traditional guardrails for fast filtering
- Use LLM judge guardrails for complex analysis
- Apply guardrails at the appropriate
wherepoint
4. Testing
Always test guardrails with various inputs to ensure they work as expected:
# Test with clean input
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "test-route",
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'
# Test with blocked content
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "test-route",
"messages": [{"role": "user", "content": "This is spam content"}]
}'
Monitoring
Track guardrail performance through metrics:
gateway_guardrails_triggered_total: Number of times each guardrail was triggeredgateway_guardrails_duration_milliseconds: Processing time for each guardrailgateway_guardrails_blocked_total: Number of requests blocked by guardrails
Next Steps
- Configuration Guide - Complete configuration reference
- API Reference - API documentation
- Monitoring - Set up observability