LLM Cost-Based Rate Limit¶

The LLM Cost-Based Rate Limit policy enforces monetary spending limits on LLM traffic. It reads the cost of each API call calculated by the LLM Cost policy and blocks requests once a configured budget is exceeded within a time window.

Prerequisites

The LLM Cost policy must be added to the same provider or proxy after this policy in the policy list. The gateway evaluates response-phase policies in reverse order, so the cost is calculated before the budget is checked. Without the LLM Cost, no cost data is available and budget enforcement is skipped.

Configuration Parameters¶

Parameter	Required	Description
Budget Limits	Yes	One or more spending caps. Each limit has an `amount` (in USD) and a `duration`.
Budget Limits[].amount	Yes	Maximum spend in USD within the time window (e.g., `10` for $10.00). Minimum: `0.000001`.
Budget Limits[].duration	Yes	Time window as a Go duration string (e.g., `"1h"`, `"24h"`, `"168h"` for one week).

When multiple budget limits are configured, all of them are enforced — the most restrictive active limit applies.

Add This Policy¶

Add this policy first, then add the LLM Cost policy after it in the policy list.
Navigate to AI Workspace > LLM Providers or LLM Proxies.
Click on the provider or proxy name.
Go to the Guardrails tab.
Click + Add Guardrail and select LLM Cost-Based Rate Limit from the sidebar.
Add one or more budget limits, each with an amount and duration.
Click Add (for providers) or Submit (for proxies).
Deploy the provider or proxy to apply the changes.

Behavior¶

Cost is read from x-llm-cost in shared metadata, set by the LLM Cost.
Accumulated spend is tracked per route within each time window.
When the spend reaches the configured budget, subsequent requests receive 429 Too Many Requests.
Each budget window resets automatically when its duration elapses.
Response headers include both internal-unit values and human-readable dollar amounts:

Header	Description
`X-RateLimit-Limit`	Budget limit in internal scaled units
`X-RateLimit-Remaining`	Remaining budget in internal scaled units
`x-ratelimit-cost-limit-dollars`	Budget limit in USD (e.g., `10.000000`)
`x-ratelimit-cost-remaining-dollars`	Remaining budget in USD (e.g., `7.432100`)

Example: $10/hour and $100/day Budget¶

The following configuration enforces two simultaneous spending limits. Both must be satisfied for requests to proceed — once either limit is reached, requests are blocked until that window resets.

Budget Limit	amount	duration
Hourly cap	`10`	`1h`
Daily cap	`100`	`24h`

LLM Cost — Required: calculates the cost stored in metadata
Token-Based Rate Limit — Enforce token consumption limits instead of monetary budgets
Policy Hub — Full policy specification and latest version

LLM Cost-Based Rate Limit¶

Configuration Parameters¶

Add This Policy¶

Behavior¶

Example: $10/hour and $100/day Budget¶

Related¶