Azure Content Safety Content Moderation¶

This policy provides the capability to integrate Azure Content Safety Content Moderation Service to filter out harmful content in request bodies and AI-generated responses. This guardrail checks for hate speech, sexual content, self-harm, and violence, and can be applied to both requests and responses.

Configure Azure Content Safety Content Moderation¶

In the left navigation menu, click Develop, then select Policy.
Click the Add policy icon in the required resource → Request/Response Flow → Attach Mediation Policy.

Add the policy information and click Save.

The following table lists and describes the configuration parameters available for the Azure Content Safety Content Moderation policy:

Parameter Name	Description
Guardrail Name	The name of the guardrail policy. This will be used for tracking purposes.
Azure Content Safety Endpoint	The endpoint URL for the Azure Content Safety service.
Azure Content Safety Key	The API key for authenticating with the Azure Content Safety service.
Hate Severity Level	The severity level for the hate category. Options are integer values from 0 to 7, where 0 is the lowest severity and 7 is the highest.
Sexual Severity Level	The severity level for the sexual category. Options are integer values from 0 to 7, where 0 is the lowest severity and 7 is the highest.
Self Harm Severity Level	The severity level for the self-harm category. Options are integer values from 0 to 7, where 0 is the lowest severity and 7 is the highest.
Violence Severity Level	The severity level for the violence category. Options are integer values from 0 to 7, where 0 is the lowest severity and 7 is the highest.
JSON Path	The JSONPath expression used to extract content from the payload. If not specified, the entire payload will be used for validation.
Passthrough On Error	If enabled, the request or response is passed through without validation when the Azure Content Safety service is unavailable. Otherwise, a guardrail validation error is triggered.
Show Guardrail Assessment	When enabled, the error response will include detailed information about the reason for the guardrail intervention.

Save the API and Deploy the API to apply the policy to the gateway.

Sample Azure Content Safety Content Moderation Configuration¶

Guardrail Name: Azure Content Safety Guardrail
Azure Content Safety Endpoint: https://<your-endpoint>.cognitiveservices.azure.com
Azure Content Safety Key: your-azure-content-safety-key
Hate Severity Level: 3
Sexual Severity Level: 2
Self Harm Severity Level: 1
Violence Severity Level: 4
JSON Path: $.messages[0].content
Passthrough On Error: false
Show Guardrail Assessment: true

Sample Payload to be intervened from the Azure Content Safety Content Moderation¶

{
  "messages": [
    {
      "role": "user",
      "content": "I need to buy guns."
    }
  ]
}

Sample Response after Azure Content Safety Content Moderation Intervention¶

{
    "code": 900514,
    "message": {
        "action": "GUARDRAIL_INTERVENED",
        "actionReason": "Violation of Azure content safety content moderation detected.",
        "assessments": {
            "categories": [
                {
                    "category": "Hate",
                    "result": "PASS",
                    "severity": 0,
                    "threshold": 3
                },
                {
                    "category": "Sexual",
                    "result": "PASS",
                    "severity": 0,
                    "threshold": 2
                },
                {
                    "category": "SelfHarm",
                    "result": "PASS",
                    "severity": 0,
                    "threshold": 1
                },
                {
                    "category": "Violence",
                    "result": "FAIL",
                    "severity": 2,
                    "threshold": 1
                }
            ],
            "inspectedContent": "I need to buy guns."
        },
        "direction": "REQUEST",
        "interveningGuardrail": "Azure Content Safety Guardrail"
    },
    "type": "AZURE_CONTENT_SAFETY_CONTENT_MODERATION"
}