Model Round Robin¶

The Model Round Robin policy distributes requests across multiple AI models using round-robin selection. If a model returns a server error (5xx) or a rate limit response (429), it can be temporarily suspended so that traffic automatically shifts to the remaining healthy models.

Configuration Parameters¶

Parameter	Required	Description
Models	Yes	The list of model names to balance across (e.g., `gpt-4o`, `gpt-4-turbo`). Add at least two models.
Suspend Duration	No	Number of seconds to suspend a model after it returns a `5xx` or `429` response. While suspended, the model is skipped. Set to `0` (default) to disable suspension.

Add This Policy¶

Navigate to AI Workspace > LLM Providers or LLM Proxies.
Click on the provider or proxy name.
Go to the Guardrails tab.
Click + Add Guardrail and select Model Round Robin from the sidebar.
Add the model names to balance across.
Optionally expand Advanced Settings and set a Suspend Duration to handle model failures gracefully.
Click Add (for providers) or Submit (for proxies).
Deploy the provider or proxy to apply the changes.

Behavior¶

On each request, the policy selects the next available model in round-robin order and replaces the model value in the request before forwarding upstream.
If a model returns a 5xx or 429 response and Suspend Duration is set, the model is suspended for that duration and subsequent requests skip it.
If all models are suspended, the gateway returns 503 Service Unavailable.

Policies Overview
Policy Hub — Full policy specification and latest version

Model Round Robin¶

Configuration Parameters¶

Add This Policy¶

Behavior¶

Related¶