Aggregate Logs and Set Alerts Early
LLM-powered automations often succeed in demos but fall apart in production. This series shares practical ways to make those systems reliable, scalable, and maintainable.
Today’s tip: centralize your logs and set up alerts before your system breaks.
Why This Matters
Logging to a local file or the console is useful, but it doesn’t scale.
When your automation runs across containers, cloud services, or machines, raw log files become a liability. You need a central system that can collect, filter, and alert on logs in real time.
Without log aggregation and alerts, your automation might fail silently, or worse, generate incorrect results while everything looks fine.
What to Do
Route your logs to a centralized system that supports:
- Structured log ingestion (ideally JSON)
- Filtering by field (e.g., log level, service, step)
- Real-time alerting
- Dashboards for observability
Here are some good options:
Tool | Strengths |
---|
Graylog | Open-source, supports GELF, great for structured JSON |
Splunk | Enterprise-grade, highly scalable and powerful |
ELK Stack (Elasticsearch, Logstash, Kibana) | Popular and flexible |
Grafana + Loki + Fluent Bit | Lightweight and cloud-native |
Production Tip
Don’t wait for your automation to fail in production; aggregate logs and set up alerts while things still work.
Code Example
Here’s how to forward structured logs from your LLM app to Graylog using Fluent Bit.
fluent-bit.conf
[SERVICE]
Flush 5
Daemon Off
Log_Level info
[INPUT]
Name tail
Path /var/log/llm-automation.log
Parser json
Tag llm
[PARSER]
Name json
Format json
Time_Key timestamp
Time_Format %Y-%m-%dT%H:%M:%S
[OUTPUT]
Name gelf
Match llm
Host graylog
Port 12201
Mount this config in your Docker Compose setup alongside your automation container. Fluent Bit will stream log entries to Graylog in real time.
Example Output
Here’s what one of your logs might look like in Graylog:
{
"timestamp": "2025-08-03T14:36:42",
"level": "ERROR",
"prompt": "Generate a contract for a freelancer",
"response": "I cannot comply with that request.",
"source": "openwebui",
"session_id": "abc123"
}
Going Further
Once your logs land in a system like Graylog, you can:
- Filter logs to isolate failed prompts, slow responses, or long token counts.
- Set alerts:
- When error rates rise
- If a specific LLM prompt starts failing
- When response latency spikes
- Build dashboards:
Final Thought
Log aggregation doesn’t just store errors—it gives you the power to spot and respond to them. Alerts are how your automation asks for help when something goes wrong.
Log aggregation and alerting (plus the filtering and reporting that goes along with it) is a big step toward making your AI solution production ready.