Spend caps and abuse handling
# Spend caps and abuse handling
Evidence that AI features have per-tenant rate-limits and spend caps so a single tenant (or a misbehaving feature) can't burn through monthly model budget, and that there's a documented runbook for abuse and incidents.
Evidence to keep here
- Spend-cap configuration — per-tenant or per-org quotas at the gateway.
- Alert routing — what fires when caps are hit.
- Abuse runbook — what to do when a customer reports prompt-injection, data leakage, or hostile use.
Spend-cap configuration
The gateway in front of model endpoints should enforce:
| Tier | Per-minute rate limit | Daily token cap | Monthly spend cap |
|---|---|---|---|
| Free tier | 60 req/min | 50k tokens | $0 (hard stop) |
| Paid tier | 600 req/min | 1M tokens | per contract |
| Enterprise | 6000 req/min | 10M tokens | per contract |
Evidence to keep:
- Rate-limit / quota config (gateway / cloud-provider quotas)
- Logs showing limits actually firing for at least one tenant
- Alert rules that fire to on-call when:
- A single tenant exceeds 80% of monthly cap
- Org-wide spend exceeds budget threshold
- Unusual spike (e.g., >5σ above tenant's baseline)
Provider-level guardrails (defense in depth)
Even with your gateway, set caps at the model-provider level too — in case the gateway is bypassed:
- OpenAI: Org → Usage limits → Hard / Soft caps per project
- Vertex AI: Quotas → API requests per minute, tokens per project
- Bedrock: Service quotas + CloudWatch billing alarms
Save the config exports.
Abuse runbook
# AI Abuse Response Runbook
## Triggers
1. **Spend anomaly** — alert fires for spike or cap-approach
2. **Customer report** — "your AI said something inappropriate" / "your AI leaked another customer's
data" / "prompt injection attack succeeded"
3. **Internal review flag** — content moderation or safety review surfaces something
## Response (Sev1 — data leak or breach)
1. Acknowledge within 15 minutes
2. **Throttle or disable** the affected endpoint immediately if data leakage is suspected (better to
break the feature than leak more)
3. Pull recent inference logs for the affected tenant + the complaining customer
4. Engage security escalation + legal escalation
5. Customer notification per breach policy
## Response (Sev2 — misuse / abusive prompts)
1. Acknowledge within 1 hour
2. Identify the offending tenant / user
3. Apply rate-limit reduction or temporary block
4. Document with a per-incident write-up
## Response (Sev3 — spend anomaly only, no leakage)
1. Acknowledge within 4 hours
2. Identify the runaway feature or tenant
3. Apply quota reduction; investigate root cause (loop? compromised key?)
4. File a postmortem if the root cause is engineering-side
Sample answer for the questionnaire
AI features run behind a gateway that enforces per-tenant rate limits and monthly spend caps. Quotas are documented in
spend-caps.md. When caps are approached or anomalies are detected, alerts page on-call. Abuse and misuse responses follow the runbook inspend-caps.md(also in the abuse section).
Example filenames
gateway-quotas-2026-05-15.yaml
provider-spend-limits-2026-05-15.json
abuse-runbook-2026.md
incident-log-2026-Q2.md
Refresh
Quarterly. Confirm caps still match contract tiers; review any incident reports.