Azure Content Safety Content Moderation¶
The Azure Content Safety guardrail integrates with the Azure Content Safety API to filter harmful content in requests and LLM-generated responses. It checks for four categories — hate speech, sexual content, self-harm, and violence — each with a configurable severity threshold.
Prerequisites
This guardrail requires an active Azure subscription with the Azure Content Safety service enabled.
- In the Azure Portal, search for Content Safety and create a new resource.
- After the resource is created, open it and go to Keys and Endpoint.
- Copy the Endpoint URL and one of the API Keys.
- Add them to your gateway's
config.tomlfile:
Configuration Parameters¶
Advanced Settings¶
Content moderation settings are configured independently for the request and response phases. Both sections have the same parameters:
| Parameter | Default | Description |
|---|---|---|
| JSON Path | — | JSONPath to extract the content to evaluate. If empty, the entire payload is evaluated as plain text. |
| Passthrough On Error | false |
When true, requests continue if the Azure API call fails. When false, an error is returned on API failure. |
| Show Assessment | false |
When true, the intervention response includes detailed category scores. |
| Hate Severity Threshold | 4 |
Severity threshold for hate speech (0–7). Use -1 to disable. Content at or above the threshold is blocked. |
| Sexual Severity Threshold | 5 |
Severity threshold for sexual content (0–7). Use -1 to disable. |
| Self Harm Severity Threshold | 3 |
Severity threshold for self-harm content (0–7). Use -1 to disable. |
| Violence Severity Threshold | 4 |
Severity threshold for violence (0–7). Use -1 to disable. |
Severity Scale¶
| Score | Meaning |
|---|---|
| 0–1 | Safe or negligible |
| 2–3 | Low severity |
| 4–5 | Medium severity |
| 6–7 | High severity |
Add This Policy¶
- Navigate to AI Workspace > LLM Providers or LLM Proxies.
- Click on the provider or proxy name.
- Go to the Guardrails tab.
- Click + Add Guardrail and select Azure Content Safety Content Moderation from the sidebar.
- Expand Advanced Settings to configure severity thresholds and error handling separately for request and response phases.
- Click Add (for providers) or Submit (for proxies).
- Deploy the provider or proxy to apply the changes.
Behavior¶
- When content meets or exceeds a configured severity threshold, the gateway returns
422 Unprocessable Entitywith a guardrail intervention response. - Request and response phases are evaluated independently using their respective settings.
- If Show Assessment is enabled, the response includes the category scores that triggered the block.
Related¶
- Guardrails Overview
- PII Masking Regex — Lightweight PII masking without external services
- Policy Hub — Full policy specification and latest version