Semantic Caching¶
AI services frequently involve repetitive queries, leading to unnecessary token usage and increased latency. API Platform’s AI Gateway introduces semantic caching, allowing responses to similar requests to be cached and reused intelligently, minimizing redundant processing, improving response times, and reducing overall costs.
Configure Semantic Caching Policy¶
-
In the left navigation menu, click Develop, then select Policy.
-
Click Add Resource Level Policy → Request flow → Attach mediation policy → Semantic Caching
-
Add the embedding provider and vector store configurations and click Save.
-
Save the API and Deploy the API to apply the policy to the gateway.
The configurable fields of the above policy have been described below.
| Field | Description | Example Value |
|---|---|---|
Embedding Provider |
AI provider used for generating embeddings (Azure OpenAI or Mistral). | Azure OpenAI |
Auth Header Name |
Header name for authentication (Use the header name Authorization for Mistral and api-key for Azure OpenAI.). |
api-key |
API Key |
API key for authenticating with the embedding provider. | 49fdadxxxxxxxxxxxxxxxxxxxxxxxxxx |
Embedding Model Name |
Specific embedding model to use from the provider. | text-embedding-ada-002 |
Embedding Upstream URL |
Endpoint URL of the embedding service. | https://example.openai.azure.com/openai/deployments/xxxxx/embeddings?api-version=2025-07-21 |
Vector Store |
Type of vector database to store embeddings (Currently only Redis is supported). |
Redis |
Host |
Host address of the vector database. | redis-xxxxx.us-east.ec2.redis-cloud.com |
Port |
Network port number of the vector database. | 6379 |
Dimensions |
The dimensionality of the vectors generated from the selected embedding model. (Refer to the provider's official documentation to find out the exact values). | 1536 |
Threshold |
Dissimilarity threshold which is a decimal value for semantic matching that determines the required similarity for cache matches. [Note: Lower values (closer to 0) enforce stricter semantic similarity, while higher values allow weaker matches. Typical range: 0.0 (exact) to higher values (e.g., 0.5, 1.2, etc.)]. |
0.1 |
Username |
Username for database authentication. | newuser |
Password |
Password for the specified database user. | securepassword123 |
Database |
Index of the vector database to connect to. | 0 |
Sample Payloads and Responses¶
First Request:
< HTTP/2 200
< content-type: application/json
< x-cache-status: MISS
< server: envoy
<
{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"You can reset your Google account password by following these steps:\n\n1. Go to the Google account recovery page: https://accounts.google.com/signin/recovery.\n\n2. Enter your email address associated with your Google account and click on \"Next.\"\n\n3. You will be prompted to enter the last password you remember. If you don't remember any, click on \"Try another way.\"\n\n4. Google will send a verification code to your recovery email address or phone number. Enter the code when prompted.\n\n5. Once your identity is verified, you will be able to create a new password for your Google account.\n\n6. Enter the new password, confirm it, and click on \"Change Password.\"\n\nYour Google account password should now be reset successfully.","role":"assistant"}}],"created":1753434513,"id":"chatcmpl-Bx8hlwkW0SGNRSaR1BoVoNYpK5aTr","model":"gpt-3.5-turbo-0125","object":"chat.completion","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{}}],"system_fingerprint":"fp_0165350fbb","usage":{"completion_tokens":149,"prompt_tokens":16,"total_tokens":165}}
Second Request:
< HTTP/2 200
< content-type: application/json
< x-cache-status: HIT
< server: envoy
<
{"choices":[{"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"You can reset your Google account password by following these steps:\n\n1. Go to the Google account recovery page: https://accounts.google.com/signin/recovery.\n\n2. Enter your email address associated with your Google account and click on \"Next.\"\n\n3. You will be prompted to enter the last password you remember. If you don't remember any, click on \"Try another way.\"\n\n4. Google will send a verification code to your recovery email address or phone number. Enter the code when prompted.\n\n5. Once your identity is verified, you will be able to create a new password for your Google account.\n\n6. Enter the new password, confirm it, and click on \"Change Password.\"\n\nYour Google account password should now be reset successfully.","role":"assistant"}}],"created":1753434513,"id":"chatcmpl-Bx8hlwkW0SGNRSaR1BoVoNYpK5aTr","model":"gpt-3.5-turbo-0125","object":"chat.completion","prompt_filter_results":[{"content_filter_results":{},"prompt_index":0}],"system_fingerprint":"fp_0165350fbb","usage":{"completion_tokens":149,"prompt_tokens":16,"total_tokens":165}}
Note
The effectiveness of semantic caching can vary based on the selected embedding model and configured similarity threshold. Ensure optimal performance by carefully selecting these parameters according to your specific use case.
Next Steps¶
-
Verify it is properly redeployed to ensure the policy and the configurations are properly applied.
-
Test the AI API to ensure it properly cache and serves those cached responses upon receiving semantically similar queries. See Test REST Endpoints via the OpenAPI Console.
By configuring semantic caching as outlined above, you can efficiently optimize AI service usage within your API Platform environment, significantly reducing latency and operational costs.


