Model Round Robin¶
The Model Round Robin policy distributes requests across multiple AI models using round-robin selection. If a model returns a server error (5xx) or a rate limit response (429), it can be temporarily suspended so that traffic automatically shifts to the remaining healthy models.
Configuration Parameters¶
| Parameter | Required | Description |
|---|---|---|
| Models | Yes | The list of model names to balance across (e.g., gpt-4o, gpt-4-turbo). Add at least two models. |
| Suspend Duration | No | Number of seconds to suspend a model after it returns a 5xx or 429 response. While suspended, the model is skipped. Set to 0 (default) to disable suspension. |
Add This Policy¶
- Navigate to AI Workspace > LLM Providers or LLM Proxies.
- Click on the provider or proxy name.
- Go to the Guardrails tab.
- Click + Add Guardrail and select Model Round Robin from the sidebar.
- Add the model names to balance across.
- Optionally expand Advanced Settings and set a Suspend Duration to handle model failures gracefully.
- Click Add (for providers) or Submit (for proxies).
- Deploy the provider or proxy to apply the changes.
Behavior¶
- On each request, the policy selects the next available model in round-robin order and replaces the model value in the request before forwarding upstream.
- If a model returns a
5xxor429response and Suspend Duration is set, the model is suspended for that duration and subsequent requests skip it. - If all models are suspended, the gateway returns
503 Service Unavailable.
Related¶
- Policies Overview
- Policy Hub — Full policy specification and latest version