Anonymize Logs Before You Regret It

LLM-powered automations often rely on detailed logs for debugging and monitoring. But if you’re logging everything – including user prompts, PII, or internal messages – you may be creating a privacy risk waiting to blow up in your face.

Today’s tip: Anonymize your logs before they go to disk.

Why This Matters

When a user interacts with your AI system, their inputs often include sensitive data: names, email addresses, locations, or confidential business info. If you’re logging that data directly (e.g. prompts or full conversations), you’re opening yourself up to:

Compliance violations (GDPR, HIPAA, CCPA, etc.)
Security risks (exposed secrets or tokens)
Embarrassing breaches (internal data accidentally logged)

If you can’t answer those questions, you can’t fix the system. But if you log every interaction with the right metadata, you’ll see patterns, fix bugs, and even improve model performance over time.

What to Do

Create a log wrapper that sanitizes sensitive fields from structured data like prompts and responses. You can:

Replace names/emails with placeholder tokens
Mask or hash session/user IDs
Redact known PII formats (emails, phone numbers, etc.)

This balances observability with compliance and safety, a must for any production AI stack.

Production Tip

Logging everything might help you debug, but it can also get you sued. Anonymize your logs before storing them.

Code Example

This is a modification of the logging code from the Metrics Matter article. It logs structured prompt/response data, but anonymizes emails and user IDs before writing to disk.

import logging
import json
import re

# Set up log file output
logging.basicConfig(
    filename="llm_output.log",
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s"
)

EMAIL_RE = re.compile(r"\b[\w\.-]+@[\w\.-]+\.\w+\b")
USER_ID_RE = re.compile(r"user_\d+")

def anonymize(data: dict) -> dict:
    data_copy = json.loads(json.dumps(data))  # Deep copy

    # Redact email addresses
    if "prompt" in data_copy:
        data_copy["prompt"] = EMAIL_RE.sub("[REDACTED_EMAIL]", data_copy["prompt"])

    # Redact emails and user IDs in metadata
    if "metadata" in data_copy:
        for key, value in data_copy["metadata"].items():
            if isinstance(value, str):
                value = EMAIL_RE.sub("[REDACTED_EMAIL]", value)
                value = USER_ID_RE.sub("[REDACTED_USER_ID]", value)
                data_copy["metadata"][key] = value

    return data_copy

def log_prompt_response(prompt: str, response: str, metadata: dict):
    log_data = {
        "prompt": prompt,
        "response": response,
        "metadata": metadata
    }
    safe_data = anonymize(log_data)
    logging.info(json.dumps(safe_data, indent=2))

# Example usage
log_prompt_response(
    prompt="My email is jane.doe@example.com and I want a summary.",
    response="Summary: You asked for a summary.",
    metadata={"user_id": "user_12345", "session_id": "abc123"}
)

Example Output

Here’s an example of what one of those logs might look like inside llm_interactions.log:

{
  "prompt": "My email is [REDACTED_EMAIL] and I want a summary.",
  "response": "Summary: You asked for a summary.",
  "metadata": {
    "user_id": "[REDACTED_USER_ID]",
    "session_id": "abc123"
  }
}

Going Further

As you scale:

Use more advanced PII detection tools (e.g. presidio, pii-extract)
Redact phone numbers, addresses, credit cards, etc.
Normalize log formats using JSON loggers (e.g. structlog, loguru)
Use log processors like Fluentd or Logstash to enforce sanitization before forwarding logs
Combine with filtering/alerting from your log aggregation system (see: Log Aggregation & Alerting)

Final Thought

You only need one leaked email in a log file to lose user trust. You don’t need perfection, but you do need a plan. Start with anonymization to make your AI project more production ready.

Anonymize Logs Before You Regret It

Why This Matters

What to Do

Production Tip

Code Example

Example Output

Going Further

Final Thought

The Most Expensive Thing You Didn’t Build Into Your AI Demo

I Wish AI Could Sort Through Hundreds of Unread Emails to Find the Important Ones For Me

When you “Hit the Target” but Miss the Mark in AI

Governance – Responsible AI for Business Sustainability

Run Your No-Code AI Stack in Docker

Just Do It vs. ReAct: Why Strategy Comes Before Action in AI

Leave a Reply Cancel reply

Why This Matters

What to Do

Production Tip

Code Example

Example Output

Going Further

Final Thought

Similar Posts

Leave a Reply Cancel reply