The frustrating truth about AI code generation is that the quality of the output is almost entirely determined by the quality of the prompt. A weak prompt produces a generic, broken prototype. A strong prompt produces something close to production-ready code. The model is the same. The difference is entirely in how you ask.
Most developers discover this after their first few disappointing interactions — the generated code technically runs but is missing error handling, uses hardcoded values that should be environment variables, has no validation, and makes assumptions about the tech stack that conflict with the actual project. Then they conclude that AI code generation "is not ready yet," when the real problem is that they prompted for a sketch and were surprised to get a sketch.
The anatomy of a strong code generation prompt
A prompt that reliably produces good code has five layers:
1. Context. What is this code for? What does the surrounding system look like? "Write a rate limiter" could mean a Redis-backed middleware in a Node.js API, a token bucket algorithm in Python, or a client-side request queue in JavaScript. Without context, the generator guesses — and often guesses wrong.
2. Stack specification. Name every relevant technology explicitly: language, framework, libraries, version if it matters. "Node.js 20, Express 4, Supabase, TypeScript strict mode" is infinitely more useful than "JavaScript backend."
3. Functional requirements. What should the code do? List the concrete behaviours, not just the abstract purpose. Not "handle authentication" but "validate a JWT Bearer token in the Authorization header, extract the user ID, attach it to req.user, return 401 if missing or invalid, return 403 if the user does not exist in the database."
4. Non-functional requirements. This is where most prompts fail. Production code has requirements beyond "it works": error handling, logging, validation, security considerations, performance constraints. State them explicitly: "Include input validation with Zod, structured error responses in the format { error: string, code: string }, and console.error logging for server errors. Do not expose internal error messages to the client."
5. Examples or constraints. If there is a pattern you want the code to follow, or a pattern you specifically want to avoid, say so. "Follow the existing pattern in /lib/auth/middleware.ts" or "Do not use any external libraries beyond what is already installed."
Before and after: weak prompt vs strong prompt
Weak prompt: "Write an API endpoint to create a user."
This produces a ten-line Express handler that inserts data into a database without validation, without error handling, with no consideration for duplicate emails, and with the database connection hardcoded or assumed.
Strong prompt: "Write a POST /api/users endpoint in Node.js with Express and TypeScript. Stack: Express 4, Supabase (server client), Zod for validation. The endpoint should: (1) validate the request body has email (valid email format), password (min 8 chars), and optional displayName (max 50 chars); (2) check for duplicate email and return 409 if found; (3) hash the password with bcrypt (12 rounds); (4) insert into the public.users table with id, email, password_hash, display_name, created_at; (5) return 201 with { id, email, displayName } on success. Error format: { error: string }. Do not return the password hash. Include a try/catch wrapping the entire handler and log errors with console.error. Do not use any libraries beyond Express, Supabase, Zod, and bcrypt."
The strong prompt produces a complete, secure, production-ready handler. The weak prompt produces a placeholder.
Prompting for components vs systems
Single-function prompts ("write a function that does X") produce better results than system-level prompts ("write an authentication system") because the scope is contained enough for the generator to reason about all the edge cases.
For larger systems, break the prompt into steps. Generate the database schema first. Then the validation layer. Then the route handlers. Then the client-side hooks. Each step has a narrower scope and produces better output than trying to generate the whole system in one shot.
The iteration pattern that actually works
First draft → review → targeted refinement prompts. This cycle is faster than trying to write the perfect prompt on the first attempt.
Good refinement prompts are specific: "The validateEmail function does not handle email addresses with plus signs. Fix it." Not: "Make the validation better." The more specific the refinement, the more accurate the fix.
Common refinement patterns:
- "Add input validation for [specific field] with [specific rule]"
- "Wrap the database call in a try/catch and return [specific error format] on failure"
- "Extract the [repeated logic] into a separate function called [name]"
- "Replace the hardcoded [value] with an environment variable called [NAME]"
- "Add TypeScript types for the [function/component] — currently it uses any"
What to always review in AI-generated code
Even with a strong prompt, there are things that AI code generators routinely get wrong or skip:
Security boundaries. Is user input ever passed directly to a database query, file path, or shell command without sanitisation? AI generators sometimes produce SQL injection vectors in non-ORM code.
Secret handling. Are any API keys, tokens, or credentials hardcoded in the output? This happens less than it used to but still occurs.
Error surface. Does every error path return an appropriate response without leaking internal details? Generators sometimes let raw error messages reach the client.
Race conditions. In concurrent code, is shared state accessed safely? This is an area where AI generation is weakest — concurrency bugs are subtle and context-dependent.
Missing edge cases. What happens with empty arrays, null values, very long strings, or unexpected input types? Test the edge cases explicitly, especially for validation functions.
The right mental model for AI code generation
Think of AI code generation as a very fast, very knowledgeable junior developer who has read every piece of documentation and every Stack Overflow thread but has never shipped a production system. The output is technically informed but needs senior review before it goes anywhere near production users. The value is speed and coverage — the AI writes the boilerplate, the standard patterns, the infrastructure code that is important but not differentiated. You focus your attention on the logic that is specific to your product and the review that catches the places where the AI's pattern-matching fell short.