Evidence kitevidence-folder-template/06-ai-workload-controls/prompt-tool-call-logging.md

AI workload controls

Prompt / tool-call logging

Evidence that prompts, tool calls, and outputs are logged with a policy that bounds retention and access — so investigations are possible without becoming a long-term data risk.

Download whole kit View on GitHub

Evidence that prompts, tool calls, and outputs are logged with a policy that bounds retention and access — so investigations are possible without becoming a long-term data risk.

Evidence to keep here

Logging policy — what's logged, retention, who can read it.
Sample log entry — sanitized.
Access controls — who can query the logs.

Logging policy template

# AI Inference Logging Policy

## What we log

For every inference request:

- Timestamp, request ID, tenant ID, user ID
- Endpoint, model, input/output token counts
- Latency, status code

For inference requests **flagged for review** (per the abuse runbook):

- Full prompt
- Full output (or hash, depending on customer DPA)
- Tool calls made and their results

For routine traffic:

- {{policy: store hashes of prompt + output, retain full content for N days, etc.}}

## What we don't log

- {{e.g. customer PII unless required for incident investigation}}
- {{e.g. authentication tokens, API keys}}

## Retention

| Log type                      | Retention                               |
| ----------------------------- | --------------------------------------- |
| Request metadata (no content) | 365 days                                |
| Sampled prompt+output content | 30 days                                 |
| Flagged-for-review content    | 90 days, with security-team access only |

## Access

- Application engineers: metadata logs only
- AI / safety team: metadata + sampled content
- Security: all logs, for active investigations

Reading the flagged-content store requires an access-review entry (who, when, why).

## Customer data alignment

This policy must align with our customer DPA terms. If a customer contract states "no AI training on
customer data" or "30-day retention," our policy must respect the stricter term.

How to gather sample evidence

# Pull a sanitized sample log entry from your logging backend
# (Datadog / Honeycomb / Loki / native)
# Replace any actual content with [REDACTED] before saving

Sample answer for the questionnaire

Inference requests are logged centrally with retention bounded per the policy in prompt-tool-call-logging.md. Metadata (timestamps, token counts, tenant IDs) is retained for 365 days; content is sampled for 30 days; flagged-for-review content is retained for 90 days with security-team-only access. Retention windows align with customer DPA terms; we do not retain customer prompts longer than contracted.

Example filenames

inference-logging-policy-2026.md
sample-log-entry-sanitized-2026-05-15.json
log-access-roles-2026-05-15.md

Refresh

Policy: annually unless changed. Sample log: quarterly.