Published January 10, 2025
4 min read
A SaaS company's customer support chatbot was supposed to save money. Instead, it was hemorrhaging $500+ per day in API costs with zero visibility into why. Worse, a prompt injection vulnerability was leaking internal documentation to anyone who knew how to ask.
After a 2-week Prevention Sprint, costs dropped 90%, data leakage stopped completely, and the team finally had the monitoring they needed.
Here's how it happened.
The company had launched their RAG-powered chatbot six months earlier. It worked well—maybe too well. Usage exploded, and so did costs. But without proper monitoring, no one knew where the money was going.
The problems ran deeper than billing:
The team knew they had a problem. They just didn't know how big.
Facing similar issues with your LLM application?
Book a free Prevention Sprint discovery call to identify vulnerabilities before they become incidents.
Before implementing fixes, we needed to understand what we were dealing with. The baseline assessment revealed 11 critical findings:
The baseline painted a clear picture: this wasn't just a cost problem. It was a security incident waiting to happen.
Over two weeks, we implemented a layered defense strategy. Each layer addressed specific findings from the baseline.
from guardrails import Guard
from guardrails.validators import (
ToxicLanguage,
DetectPII,
RestrictToTopic,
DetectPromptInjection,
)
input_guard = Guard().use_many(
DetectPromptInjection(on_fail="exception"),
ToxicLanguage(threshold=0.8, on_fail="filter"),
DetectPII(pii_entities=["EMAIL", "PHONE", "SSN"], on_fail="fix"),
RestrictToTopic(
valid_topics=["product support", "billing", "technical help"],
on_fail="refrain"
),
)Every user input now passes through validation before reaching the model. Prompt injection attempts get blocked. PII gets redacted. Off-topic queries get redirected.
output_guard = Guard().use_many(
DetectPII(pii_entities=["EMAIL", "PHONE", "SSN", "CREDIT_CARD"], on_fail="fix"),
RestrictToTopic(
valid_topics=["product support", "billing", "technical help"],
invalid_topics=["internal roadmap", "pricing strategy", "employee data"],
on_fail="refrain"
),
)Even if something slips past input validation, output guardrails catch it before the response reaches the user. Internal documents that shouldn't appear in responses get filtered.
class SpendLimiter:
def __init__(self, daily_limit_usd: float = 5.0):
self.daily_limit = daily_limit_usd
self.usage_store = RedisUsageStore()
async def check_and_record(self, user_id: str, estimated_cost: float) -> bool:
current_spend = await self.usage_store.get_daily_spend(user_id)
if current_spend + estimated_cost > self.daily_limit:
raise SpendLimitExceeded(
f"Daily limit of ${self.daily_limit} exceeded"
)
await self.usage_store.record_spend(user_id, estimated_cost)
return TrueEach user now has a configurable daily budget. When they hit it, they get a friendly message instead of burning through company funds.
We integrated LangSmith for full observability:
@dataclass
class AuditEvent:
timestamp: datetime
user_id: str
action: str
input_hash: str # For privacy, we hash inputs
guardrail_results: dict
cost_usd: float
blocked: bool
block_reason: Optional[str]Every interaction now creates an immutable audit record. When the security team needs to investigate, they have complete visibility.
Facing similar issues with your LLM application?
Book a free Prevention Sprint discovery call to identify vulnerabilities before they become incidents.
After the Prevention Sprint, the metrics told the story:
| Metric | Before | After | Change |
|---|---|---|---|
| Daily API spend | $487 | $48 | -90% |
| Data leakage incidents | Unknown (no monitoring) | 0 | — |
| Prompt injection attempts | Unknown | 47 blocked (first month) | — |
| Cost attribution | None | 100% | — |
| Mean time to detect issues | Days | Minutes | — |
The 90% reduction came from multiple sources:
In the first month post-implementation:
The team could finally answer questions like:
"We went from flying blind to having complete visibility. The cost savings paid for the entire engagement in the first week."
— Engineering Lead (anonymized)
If you're running an LLM application in production, here's what this case study teaches:
Measure before you optimize: The 48-hour baseline revealed problems the team didn't know existed. You can't fix what you can't see.
Layer your defenses: No single guardrail catches everything. Input validation, output filtering, spend limits, and monitoring work together.
Per-user limits are non-negotiable: Without them, a single user (or attacker) can drain your budget or exploit your system.
Audit trails enable incident response: When something goes wrong—and it will—you need to know what happened.
Security and cost optimization overlap: Many cost-saving measures (rate limiting, input validation, query chain limits) also improve security.
The chatbot still handles thousands of conversations daily. The difference is that now, the team knows exactly what it's doing—and can prove it.
Ready to secure your LLM application?
The Prevention Sprint identifies vulnerabilities and cost leaks in 48 hours. Implementation takes 2 weeks. Results are immediate.