Metrics Matter: Log What the LLM Says
AI automation may impress in a demo, but it often fails in production, not because the model is wrong, but because no one knows what it said when it mattered. If you don’t log LLM behavior, your automation is a black box.
In this article, you’ll implement a simple, powerful step toward observability: log every LLM interaction as structured JSON to a file.
Why This Matters
When a failure happens in production, these are the questions you’ll want to answer:
- What was the user asking?
- Which prompt template ran?
- What did the LLM respond with?
- Was that response slow, expensive, or weird?
- Has this happened before?
If you can’t answer those questions, you can’t fix the system. But if you log every interaction with the right metadata, you’ll see patterns, fix bugs, and even improve model performance over time.
What to Do
At minimum, log:
- Timestamp
- Prompt template or name
- User input
- Full prompt (after templating)
- Model response
- Session ID
- Latency
- Token usage
- Success/failure flag
Keep it structured so it’s easy to query later.
Production Tip
Logging in structured JSON format makes it easier to send data to log analysis tools like Logstash, Fluentd, or Datadog later. Even if you’re starting small, use structured logs from day one. And instead of printing to the console, write logs to a dedicated log file in append mode.
Code Example: Log LLM Prompts and Responses to File
Here’s a minimal working version that writes LLM interactions to a log file.
import openai
import logging
import time
import uuid
import json
# Set up logging to a file with structured JSON format
logging.basicConfig(
filename='llm_interactions.log',
level=logging.INFO,
format='%(message)s',
filemode='a' # Append mode
)
def call_llm(prompt_template, user_input):
session_id = str(uuid.uuid4())
start_time = time.time()
prompt = prompt_template.format(user_input=user_input)
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
latency = round(time.time() - start_time, 2)
completion = response['choices'][0]['message']['content']
tokens_used = response['usage']['total_tokens']
log_data = {
"timestamp": time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
"session_id": session_id,
"prompt_template": "basic_question",
"input": user_input,
"prompt": prompt,
"response": completion,
"tokens": tokens_used,
"latency_seconds": latency,
"status": "success"
}
logging.info(json.dumps(log_data))
return completion
except Exception as e:
latency = round(time.time() - start_time, 2)
error_log = {
"timestamp": time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
"session_id": session_id,
"prompt_template": "basic_question",
"input": user_input,
"error": str(e),
"latency_seconds": latency,
"status": "failure"
}
logging.error(json.dumps(error_log))
return "Something went wrong."
# Example usage
response = call_llm("What's a short summary of this question: '{user_input}'", "How do LLMs differ from traditional ML?")
print(response)
Example Output
Here’s an example of what one of those logs might look like inside llm_interactions.log:
{
"timestamp": "2025-08-03T11:52:00Z",
"session_id": "8b96fbe4-3925-4f85-9a8e-1f746d71f558",
"prompt_template": "basic_question",
"input": "How do LLMs differ from traditional ML?",
"prompt": "What's a short summary of this question: 'How do LLMs differ from traditional ML?'",
"response": "LLMs use massive text data and transformer architecture to generate language-based outputs, unlike traditional ML models which are typically trained for specific structured tasks.",
"tokens": 78,
"latency_seconds": 1.87,
"status": "success"
}
Going Further
As you scale:
- Pipe logs to Logstash, Fluentd, or a cloud logging service.
- Add request IDs to correlate logs across systems.
- Set alerts for latency spikes or error patterns.
Final Thought
Observability isn’t just for engineers. It’s for AI, too. Logging LLM interactions helps your team detect bugs, optimize prompts, control costs, and prevent surprises. It’s one of the lowest-effort, highest-impact improvements you can make today.
So don’t guess what your AI is doing. Log it.