Does Your Vibe Coded App Suck

Does Your Vibe Coded App Suck?

You built it. It works. You shipped it, or you are about to. And there is this nagging feeling you cannot quite shake.

You cannot actually tell if what you built is good.

It runs. It does the thing you wanted it to do. The login screen looks right. The form submits. Your friends say it is cool. But there is this voice in the back of your head asking whether a real engineer could look at it for five minutes and find a dozen things wrong. Whether something is sitting in there that you would be embarrassed by if you could see it. Whether the whole thing is one weird user behavior away from falling apart.

You do not know. And not knowing is its own kind of awful, because you cannot fix what you cannot see, and you cannot even tell if there is anything to fix.

Almost every honest vibe coder I talk to is living somewhere in this feeling. They have a working app and no real way to evaluate it. The app runs, which is supposed to be the hard part, and yet they feel less confident now than they did before they started.

There is a reason for that, and it is not the one most people assume.

The Question You Cannot Answer From Where You Are Standing

When you ask “does my app suck,” what you are really asking is whether the thing you built matches what a good version of it would look like. That is a comparison question. You have one side of the comparison (the app you built). You do not have the other side (the version that would be good). And without both sides, the question is unanswerable. You can stare at your app forever and never find out, because the answer is not in there.

Most vibe coders I have watched try to answer this question reach for one of three tools, and all three of them fail in the same quiet way.

The first is asking the AI if the code is good. The AI almost always says yes, or says yes with some friendly suggestions for improvements. This feels like reassurance until you realize the AI is the one that wrote it. You are asking the author to grade their own work, and the author is famously generous.

The second is testing it yourself. You click around. You submit the form a few times. You log in with your own email. Everything works. You ship. The problem with this approach is that you are testing what you thought to test, which is roughly the same set of things you thought to ask for in the first place. The stuff that breaks later is, by definition, the stuff you did not think to test, because if you had thought to test it, you probably would have thought to ask for it. Testing the happy path tells you the happy path works. It does not tell you whether anything else exists.

The third is showing it to other people. They say it looks great. This is genuinely nice and tells you almost nothing about quality, because they are looking at the same surface you are, with the same lack of context about what should be underneath.

None of these three tools is doing what you actually need. None of them is providing the other side of the comparison. They are all just looking at the app from different angles and reporting back that it looks fine from over there, too.

The reason your app might suck and you cannot tell is that you have nothing to measure it against. You have the result. You do not have the standard. And you cannot grade an answer without an answer key.

What Experienced Engineers Are Actually Doing

There is a thing experienced developers do that looks like magic from the outside, and it is worth pulling apart, because the trick to it is also the trick to your problem.

When an experienced engineer reviews a piece of code, they can usually tell within a few minutes whether it is good. People assume this is some kind of pattern recognition built up from staring at thousands of bugs over the years, and that is partly true, but it is not really the main thing. The main thing is more boring than that.

Experienced engineers are carrying the answer key in their head.

When they look at a login system, they are not reading the code looking for problems. They are comparing the code to a complete internal model of what a login system is supposed to do. They know, without having to think about it, that a login system is supposed to verify email addresses, handle password resets, throttle login attempts, expire sessions, hash passwords with something modern, protect against enumeration attacks, and probably ten other things that they would notice were missing the instant they did not see them. None of that came from reading the code. All of it came from years of building login systems and reading them and watching them break.

That answer key is what experience actually is. It is not magic. It is not even pattern recognition in the way people usually mean. It is just an extraordinarily detailed mental model of what good software is supposed to do, loaded and ready, applied automatically.

When I (someone with forty years of this) sit down and vibe code an app without writing anything down first, the result usually comes out fine. Not because I am skipping the spec. Because I am the spec. The answer key is in my head, running silently the whole time, shaping every prompt I write and every correction I make. The AI gives me code, and I am comparing it against a standard I never had to articulate, because the standard has been sitting in my head for decades.

You do not have that answer key. That is the whole problem. And you are not going to develop it in any reasonable timeframe, because the only way to get it is to spend years building software and watching what breaks. There is no shortcut to forty years of experience.

But here is the part nobody tells you. You do not actually need forty years of experience. You need the artifact forty years would give you for free. And that artifact, it turns out, is something you can build in an afternoon.

Every Generation Has a Spec. The Question Is Whose

Here is the thing that should change how you think about all of this.

Every single time the AI generates code for you, it is working from a specification. It has to be. Code cannot get written without one. Something has to define what the code is supposed to do, what cases it is supposed to handle, what trade-offs it is supposed to make. There is always a spec. There is no such thing as code generated without one.

The question is not whether there is a spec. The question is who wrote it.

When you do not write the spec, the AI writes it for you. It has to. It writes it silently, in real time, pulling from your prompt, from its inferences about what you probably meant by the things you did not say, from defaults baked into its training data, from patterns it picked up from similar code it has seen. By the time the code is generated, a complete spec has been assembled and used as the blueprint. You just never saw it. It existed for the duration of the generation and then it was gone.

So when you sit down to evaluate the code afterward, you are trying to compare it against a spec that no longer exists, which was never written down, which you never approved, and which you cannot reconstruct. That is what is making the question of whether your app sucks unanswerable. Not the code. The missing spec.

This is the move that should change how you feel about all of this. The choice in front of you was never spec or no spec. There has never been a no-spec option. The choice is whether the spec your app gets built from is the one you wrote, or the one the AI improvised on your behalf.

When the AI’s improvised spec happens to match the one you would have written, vibe coding works. The app comes out good. You feel like a wizard. When it does not match, you get an app that is a faithful implementation of a spec you never saw, never approved, and cannot audit. The code is doing exactly what it was told to do. It just was not told to do what you actually wanted.

Why It Sometimes Works Anyway

Now, I can already hear the objection. You have probably vibe coded apps that came out fine. So have I. So have most of the readers of this article. If the spec problem is as fundamental as I am making it sound, why does any of this work, ever?

Because sometimes the AI’s improvised spec lines up closely with the one you would have written, and when that happens, the gap closes on its own. There are roughly four ways this can happen, and they are worth understanding because they explain when the approach is safe and when it is not.

The first is when you are building something common. The AI has seen thousands of todo apps. Its mental model of what a todo app should include probably matches yours. The two specs converge because they both came from the same well of common patterns.

The second is when your project is small enough that you can hold the whole thing in your head. With one screen and a handful of features, your prompts are basically the spec. There is not much room for the AI to fill in unsaid things, because most of what matters is being said.

The third is when the stakes are low enough that the gaps do not matter. The AI might have skipped a hundred small things, but if your app is a personal portfolio site or a weekend toy, none of those skipped things have any real consequence. The spec is incomplete, but the parts that are missing are not load-bearing.

The fourth, and the one I want to dwell on, is when the person at the keyboard already has the answer key in their head. When an experienced developer prompts, the prompts are precise enough and the corrections fast enough that the AI’s improvised spec gets steered toward the developer’s internal one within a few exchanges. The two specs converge because the developer is silently dragging the AI’s version toward their own. From the outside, it looks like the developer is just typing prompts and shipping code. From the inside, they are doing constant real-time comparison against a standard you cannot see.

This is the most important one to understand, because it is the case that confuses everyone. When you watch an experienced developer vibe code without writing anything down and the result is great, you are not watching someone get away with skipping the spec. You are watching someone whose spec is so complete and so accessible that they can keep it entirely in their head and steer the AI to match it without ever writing it down. The spec is doing all the work. It is just invisible.

You can probably feel where this is going. None of these four conditions is something you can rely on as your project grows. Common patterns stop applying when your app gets specific. Small projects get bigger. Stakes go up. And if you are not the person carrying a forty-year answer key in your head, you never had that condition working in your favor in the first place.

Spec-less vibe coding works exactly until it does not. And the line between those two states is invisible from where you are standing.

What an Answer Key Actually Looks Like

Let us make this concrete, because it has been abstract for a while.

Suppose you are building a login system for your app. Without writing anything down, you prompt the AI to build user login. It builds something. The form appears. You can sign up. You can sign in. You ship.

Three weeks later, someone tells you they got locked out and there is no way to reset their password. Then someone tells you they created an account with someone else’s email and there was no verification. Then someone tells you a bot created fifty thousand accounts overnight. None of these are bugs. They are absences. The AI built a login system that matched its improvised spec for “login system,” and its improvised spec did not include password reset or email verification or rate limiting, because you did not say those words and the AI did not assume you wanted them.

Now imagine you spent thirty minutes before the build, writing down what your login system was supposed to do. Not in some formal document. Just a list. Something like: people sign in with email and password. New users get a confirmation email and cannot log in until they click the link. Forgotten passwords can be reset with a link sent to email. After five failed login attempts, the account is locked for fifteen minutes. Sessions expire after thirty days of inactivity. Passwords are stored hashed, never in plain text.

That is your answer key. None of it required engineering knowledge. All of it required thirty minutes of actually thinking about what a login system is for and what could go wrong. You probably could have written most of it without help. The parts you were unsure about, you could have asked the AI to help you think through, before any code got written.

Now when the AI generates the code, the review changes completely. You are not staring at code looking for vibes. You are going line by line down your list. Did it send a confirmation email? Did it build the password reset flow? Did it lock accounts after failed attempts? Each one is a question with a yes or no answer. When something is missing, you can see it, because you have something that says it should be there. The absence has a shape now.

Same code. Same vibe coder. Same level of engineering knowledge. Completely different ability to tell whether the app sucks. The thirty minutes you spent on the list at the start is what made the question answerable at the end.

The Part Where You Find Out This Already Has A Name

Here is the part where I tell you that the engineering world has been doing this for decades and has elaborate names for it.

What I have been calling an “answer key,” people in software call a specification. There are different kinds for different layers of the problem. Business requirements describe what the product is supposed to do for whom and why. Product requirements describe specific features and what is in or out of scope. Design decisions describe how it should look and feel. Technical design documents describe the architecture choices. Architecture decision records capture the reasoning behind specific choices so you remember why you made them later.

I have written about all of these in other places, and I will be honest with you about something. Those articles do not get nearly the readership of articles about how to fix problems with AI-generated code. People want the troubleshooting content. They do not want the foundational-decisions content. I understand why. The decisions feel boring and the problems feel urgent.

But the decisions are the thing that prevents the problems. And once you see the answer-key idea, every one of those document types stops looking like corporate paperwork and starts looking like what it actually is: the artifact you need so that you can tell whether your app sucks. Different documents capture the answer key at different layers, but they are all doing the same job. They are all giving you something to compare your app against.

You do not need to write all of them. For most vibe-coded projects, a short product requirements document and a few notes on the technical decisions you made will do more for you than any amount of code review skill you could develop in the same time. The point is just that the answer key has to exist as an artifact outside your head, before the code does. That is the whole move.

The Honest Answer to the Question

A few weeks ago, someone left a comment on one of my videos asking how I spot hidden edge cases in AI-generated code quickly. I told them honestly, forty years of experience. And I added, if you do not have that experience, iterate through reviews from different perspectives, learn as you go, and assume there are problems.

That answer was true but incomplete, and it has been bothering me ever since. So here is the longer version.

Edge cases are not hidden in the code. They are hidden in the spec the AI wrote on your behalf, the one you never saw and cannot audit. You cannot find them in the code, because they are not in the code. Their absence is the code. The only way to make them visible is to write the spec yourself, before the AI writes one for you. Whatever you put in the spec, the AI builds and you can check. Whatever you leave out, the AI fills in silently from its own assumptions, and you have no way to know what it filled in.

Forty years of experience did not give me a way to find edge cases in code. It gave me the answer key, fully loaded, before I ever opened the AI tool. You cannot replicate the experience. But you can absolutely replicate the answer key, by writing it down. That is the part of this I can actually hand over.

So if you are sitting there with a working app and that nagging feeling that something might be wrong with it, here is what to do. Stop trying to evaluate the app. The app cannot tell you anything you do not already know. Go back to before the code existed. Write down what a good version of this thing was supposed to do. Be specific. Cover the obvious cases and then think about what should happen when things go wrong. Do not worry about whether you are missing things. You will be. That is fine. Whatever you write down is more than you had.

Then go back to your app, with the list in your hand, and check. You will be surprised what becomes visible.

That is the difference between an app that sucks and one that does not. Not engineering experience. Not better prompts. Just the answer key, written down, where you and the AI can both see it. Once you have it, the question of whether your app sucks finally becomes a question you can answer.

And being able to answer it, finally, is the whole point.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *