Picking What to Log and Monitor: Letting AI Wire Up the Plumbing While You Decide What Failure Hurts
Your app is live and working, and at some point it occurs to you that you have no idea what it’s doing when you’re not looking. So you ask your AI assistant to add logging and monitoring. Say it’s a tool that lets small landscaping crews send invoices from their phones. A minute later you’ve got logs everywhere. Every request logged, every database call timed, every error caught and recorded, a dashboard with response times and error rates and request counts climbing in neat little graphs. It looks professional. It looks like the kind of thing a real operation has. You feel covered.
And that feeling is the trap. What you’ve got is a firehose of everything, which sounds like the opposite of a problem until something actually breaks. A customer’s invoice silently fails to send, and they don’t tell you, they just stop using the app. You go looking, and the failure is in there somewhere, buried under ten thousand routine log lines about requests that went perfectly fine. The dashboard told you the error rate was low, which was true and completely useless, because the one error that mattered was a rounding error in the noise. The careless version of this job isn’t asking AI to add logging. It’s logging everything equally, which feels thorough and leaves you just as blind as logging nothing, because you can’t see the failure that hurts through the fog of failures that don’t.
What this job actually is
Picking what to log and monitor is two jobs that wear one coat. The first is the wiring: actually instrumenting your app so events get recorded, errors get caught, metrics get collected, and the whole thing flows into a dashboard you can read. That’s plumbing, and it’s real work, and AI is genuinely good at it. It knows the libraries, it knows the patterns, it can catch errors in places you’d forget to look and format the output so it’s actually readable. Handing AI the instrumentation is a fine idea.
The second job is deciding what’s worth watching. Out of everything your app does, which events would tell you something you’d act on? Which failure, if it happened quietly at 2am, would actually cost you a user or money or trust? And which of the thousand things you could log are just noise that will bury the signal when you need it? That’s not a plumbing problem. It’s a judgment call about what matters in your specific product, and it depends on what your users can’t afford to have break.
Here’s the distinction that matters: AI can generate the instrumentation, but deciding what’s worth watching is yours. More logging is not better monitoring. The value of monitoring comes entirely from being able to find the thing that matters fast, and that means knowing in advance which failures are the ones that hurt. AI can’t know that, because it can’t feel the difference between an error that’s an inconvenience and an error that quietly kills your business. It will catch and log both at the same volume, with the same urgency, and leave you to sort it out exactly when you have the least time to.
How to delegate the wiring
So lean on AI for the part it does well, which is the instrumentation itself. Once you know what you care about, AI is the right tool to make the recording happen cleanly. Ask it to wire up logging for the specific events you’ve decided matter, to catch errors with enough context that you can actually diagnose them later (what happened, to whom, with what inputs), and to surface the handful of things you want to watch into something readable.
The move that makes this delegation good is feeding AI your priorities instead of asking it to invent them. Tell it which user actions are critical and which are routine, and ask it to log the critical ones richly and the routine ones lightly or not at all. Ask it to make sure that when a critical action fails, the log line carries everything you’d need to understand the failure without reproducing it. Ask it to set up alerts on the specific failures you name, and to leave the rest as quiet background logging you can dig into only if you go looking.
What you don’t do is ask “what should I log?” or “set up monitoring for my app.” That open phrasing is what produces the firehose, because with no priorities to work from, AI does the only sensible thing it can: it logs everything, equally, just in case. That’s not monitoring; it’s hoarding. Keep the ask on execution. Here are the events I care about, here’s the failure I want to know about immediately, wire those up richly and keep the rest out of my way. You’re handing AI a filtered list and asking it to build the plumbing around it. The filtering is the part you bring.
The judgment you keep
The call about what’s worth watching is yours, and it’s yours because it turns on something AI can’t see: which failures would actually cost you something.
This is hard because almost everything looks worth logging in the abstract. Every error is technically a problem; every event is technically information. Cutting things from your monitoring feels like choosing to be blind on purpose. But monitoring everything is the same as monitoring nothing, because a signal you can’t find isn’t a signal. The judgment is in looking at everything your app does and deciding that this specific failure, the invoice that doesn’t send, is the one that has to page you the moment it happens, while these other hundred events can sit quietly in a log you’ll probably never read. For the landscaping app, a failed invoice means a crew doesn’t get paid and quietly churns; a slow-loading settings page means someone waits an extra second. Those are not the same, and your monitoring should not treat them as the same.
AI can’t make this call because it doesn’t know what’s load-bearing in your business. It can’t feel the difference between the failure that loses you a customer and the failure that loses you nothing, because that difference lives in what your product promises and what your users depend on, and that’s context AI doesn’t have. Get this wrong and you’ll either drown in alerts until you start ignoring all of them (including the real one), or you’ll have a tidy dashboard of green numbers that stays green while the thing that actually matters fails silently underneath it. Knowing which failure hurts is the whole point of watching at all.
Before you ship this job
Here’s what good delegation looks like, and the line it can’t cross.
The sample prompt. Something real you might send:
I’m building CrewBill, an app that lets small landscaping crews create and send invoices from their phones right after a job. My main user is someone like Marcus, who runs a three-person crew, does six or seven jobs a day, and needs to bill clients before he forgets. The thing that absolutely cannot fail silently is sending an invoice: if an invoice doesn’t reach a client, Marcus loses money and won’t necessarily notice. Other things matter less: a slow settings page or a failed profile-photo upload is annoying but not urgent. Wire up logging and monitoring for CrewBill with that priority in mind. For invoice creation and sending, log richly: capture who, what, the invoice amount, and the full error context if anything fails, and set up an alert that fires immediately on a failed send. For routine actions, keep logging light and don’t alert on them. Make the critical events easy to find and keep the noise out of the way.
Use this and you get monitoring built around what you actually care about. Copy it as-is and you’re watching Marcus’s business instead of yours, alerting on a failure that may not even be your product’s weak point. CrewBill’s critical failure is the silent invoice; yours is somewhere else, and the whole setup only works if it’s pointed at the thing that would actually hurt you.
The part you can’t hand off is naming which failures are the ones that hurt: the specific event that has to reach you the instant it happens, sorted out from the hundred events that can stay quiet. That ranking is the decision, and it’s the thing the prompt above is built around but could never have produced on its own.
How to check AI did its part: make the critical thing fail on purpose. In the CrewBill case, deliberately break invoice sending in a test and confirm two things happen: the alert actually fires, and the log line it leaves behind tells you enough to understand the failure without having to reproduce it. If the alert stays silent, or it fires but the log just says something generic like “request failed” with no context, the monitoring isn’t doing its job, no matter how good the dashboard looks. The test isn’t whether logging exists; it’s whether the one failure you most need to catch actually reaches you, loudly and legibly, when it happens.
What you get for doing it this way
Go back to that firehose of logs and the dashboard full of green numbers that felt like being covered. The difference between logging everything equally and monitoring what matters is the difference between drowning in information when something breaks and getting tapped on the shoulder the moment the thing that counts goes wrong. When you bring the priorities and let AI build the plumbing around them, you get a quiet system that stays quiet until it has something real to tell you, and then tells you clearly.
AI can instrument every corner of your app. Which failures are worth waking up for was always going to be your call, because only you know what your product can’t afford to lose. That’s the job: let AI wire up the watching, but decide for yourself what’s worth watching for.
