LLM Cost¶
The LLM Cost policy calculates the monetary cost of each LLM API call and makes the result available to other policies — primarily LLM Cost-Based Rate Limit. It runs in the response phase, requires no user configuration, and never exposes the cost to the caller.
Required for cost-based rate limiting
Add this policy to the same provider or proxy as LLM Cost-Based Rate Limit, and place it after it in the policy list. The gateway evaluates response-phase policies in reverse order, so the cost is calculated before the budget is checked.
How It Works¶
- When the LLM response arrives (including streaming/SSE responses), the policy reads the model name from the response body.
- It looks up the model in the built-in pricing database.
- It calculates the cost in USD based on token usage, context window tier, and service tier.
- The result is stored in
SharedContext.Metadata["x-llm-cost"]as a 10-decimal USD string (e.g.,"0.0000423100").
The cost is internal — it is never forwarded to the caller.
Supported Providers¶
| Provider | Notes |
|---|---|
| OpenAI | All models including o-series reasoning tokens, batch API, and flex/priority service tiers |
| Anthropic | Claude models including prompt caching (read/write tokens), extended thinking, and speed/geo routing |
| Google Gemini | Google AI Studio and Vertex AI, including multi-modal (audio, image), web search grounding, and thinking models |
| Mistral | All Mistral models including audio duration-based billing (Voxtral) |
Configuration Parameters¶
This policy has no user-configurable parameters. The pricing database path is a gateway-level system setting configured in config.toml.
Add This Policy¶
- Navigate to AI Workspace > LLM Providers or LLM Proxies.
- Click on the provider or proxy name.
- Go to the Guardrails tab.
- Click + Add Guardrail and select LLM Cost from the sidebar.
- Click Add (for providers) or Submit (for proxies).
- Deploy the provider or proxy to apply the changes.
Behavior¶
- Handles both streaming (SSE) and non-streaming responses without configuration.
- Supports context-window tiered pricing (>128k, >200k, >272k token tiers where applicable).
- Supports service tiers: standard, priority, flex, and batch rates.
- If the model is not found in the pricing database, cost is set to
0, a warning is logged, and the request is not blocked. - The pricing database is bundled with the gateway image and loaded at startup. A gateway restart is required to pick up pricing file updates.
Metadata Written¶
| Key | Value |
|---|---|
x-llm-cost |
USD cost as a 10-decimal string, e.g. "0.0000423100" |
x-llm-cost-status |
"calculated" on success, "not_calculated" if cost could not be determined |
Related¶
- LLM Cost-Based Rate Limit — Enforce spending budgets using the calculated cost
- Policy Hub — Full policy specification and latest version