Why does every debug start with a reproduction?

A reproduction gives you a binary signal that tells you whether each change moved you closer or further away. Without one, you are guessing. The first ten minutes nailing down the steps to make the bug appear pays for itself many times over.

Should I describe the error to Claude Code or paste it?

Paste it. Descriptions lose information. The raw error has the noun, the verb, the file, and the line, and that is usually enough to point at the fix without further investigation.

What does narrowing the surface mean in practice?

Cut the input down to the smallest case that still triggers the bug, cut the code path down to the smallest sequence that still fails, and delete imports, middleware, and conditions that are not involved. The smaller the surface, the cheaper each theory is to test.

When is git bisect the right move?

When the bug appeared between two known states. Tell git the last good commit and the first bad commit, run your reproduction at each step, and in about ten minutes you have the exact commit that introduced the regression.

Why write a failing test before the fix?

The test forces you to be precise about what wrong means, and it stays in the suite to prevent the same bug from coming back when the next refactor touches the same code.

Claude Code Club

Join the club

All guides

Debugging with Claude Code: a steady method for hard bugs

Workflow 14 min read

TL;DR

Debugging with Claude Code is the same craft you already know with one more pair of eyes. Reproduce the bug, narrow the surface, read the real error, and write a test that fails before you write the fix. This guide walks through the full method - from the first reproduction to the postmortem - covering the common traps, heisenbugs, dependency bugs, environment mismatches, and the moves that turn a bug that has eaten your afternoon into one you ship a fix for before dinner.

Start with a reproduction

Every good debug starts with a reproduction. A way to make the bug happen on demand. Without that, you are guessing. With it, you have a binary signal that tells you whether each change you make moved you closer or further away. Spend the first ten minutes nailing down the minimum steps to make the bug appear. A failing test is the strongest version of this. A curl command that returns the wrong response is the next strongest. A click path in the UI is the weakest but still useful.

Once you have a reproduction, share it with Claude Code at the top of the session. Not just the symptom, the steps. Then ask it to help reproduce locally before either of you touches the code. Half of all debug time gets spent because someone tried to fix a bug they could not actually trigger, and the fix turned out to be for a different bug entirely.

Read the real error message

Read the error before you theorize. The full error, the stack trace, the line number, the surrounding lines if they are relevant. Paste it all into Claude Code rather than describing it in your own words. Descriptions lose information. The raw error has the noun, the verb, the file, and the line. Most of the time, that is enough to point at the fix without further investigation.

Copy the entire stack trace, not just the top line, since the cause is often three frames down
Include the request or input that triggered the error so the context is complete
Note the version of the dependency that threw the error, since the same error can have different causes across versions
Capture any preceding warnings, since they often point at the underlying state issue
If there are multiple errors, share them all and let Claude Code spot which one is the root and which are the cascade

Narrow the surface

Most bugs feel huge and turn out to live in a small surface area. Narrow before you dig. Cut the input down to the smallest case that still triggers the bug. Cut the code path down to the smallest sequence that still fails. Delete the imports that are not involved, the middleware that is not in the path, the conditions that always evaluate the same way. The smaller the surface, the cheaper each new theory is to test.

Bisect when nothing else fits

If the bug appeared between two known states, bisect. git bisect is the underused move that finds the offending commit in log n steps. Tell git the last commit you know was good, the first commit you know was bad, and let it walk you through. At each step, run your reproduction and tell git whether the bug is present. In ten minutes you have the exact commit that introduced the regression. From there, Claude Code can read the diff and propose the fix in one round.

Write the failing test first

When you have the reproduction tight enough to express in code, write the test first. The test should fail because of the bug. Run it, see it fail, then write the fix. The fix passes when the test passes. This pattern saves you twice. Once because the test forces you to be precise about what wrong means. And once again later, because the test stays in the suite and prevents the same bug from coming back when the next refactor touches the same code.

1Write a test that fails because of the bug, in the smallest possible scope
2Run the test and confirm it fails for the right reason, not for a setup mistake
3Ask Claude Code to propose a fix that makes the test pass without breaking neighbors
4Run the full test suite to confirm the fix does not break anything else
5Commit the test and the fix together so the history shows the bug and its proof of fix

Logs are stories, not noise

Treat logs as a story your system tells itself. When you tail logs during a reproduction, you are reading what the system thought was happening. The gap between what the system thought and what you wanted is where the bug lives. Add a temporary log line if the existing logs do not tell you enough. Remove it after. Claude Code is good at suggesting the right log lines because it has the full code in context and can spot which states have not been observed.

Structure matters for logs too. A log line that includes the request id, the user id where relevant, and a short event name reads ten times better than a raw printf. When you debug across a few hours, structured logs let you grep for the request that failed and see only its story instead of every concurrent story. The investment pays off the second time you debug something in the same system.

When the bug is in a dependency

Sometimes the bug is not in your code. It is in a library you use. The signs are familiar. The stack trace ends in node_modules, the behavior changed after an unrelated dependency bump, the same code fails on one machine and works on another. When this happens, pin the dependency version that works, open an issue upstream if it is not already filed, and add a comment in your code that records why the pin exists. Pinning is not surrender. It is preserving a known good state while the upstream fix lands.

The bug that is really three bugs

Some bugs are actually a stack of small issues that look like one big one. The race condition that only fires when the cache misses while the user is logged out on Safari. Untangle these one at a time. Fix the cache miss case, confirm the bug still partially repros, fix the logged out case, confirm again, then chase the Safari specific piece. Trying to solve all three at once produces a fix that works for none of them. Solving them one at a time produces three small commits and a clean PR.

Feature flags and bisecting behavior

If your app uses feature flags, treat flag state as part of the reproduction. The same code can behave differently depending on which flags are on for a given user. When you debug, capture the flag state from the failing user and apply it locally. Without that step you are debugging a different version of the app than the one that broke, and the fix you ship may not address the real failure path.

Environment differences and the local trap

Works on my machine is the oldest debug answer in the book and it is right more often than people like to admit. When a bug only repros in production, line up the environments. Same Node version, same dependency versions, same env vars in spirit if not in value. Often the difference is one small thing. A timezone, a locale, a feature flag that is on locally and off in production. Make the environments diff small and the surface of the bug shrinks. Claude Code can help compare local versus production configs line by line, which sounds tedious and turns out to find the difference almost every time.

Reading other people's code under pressure

Debugging often means reading code you did not write. A library function, a teammate's PR from last quarter, a vendor SDK. Claude Code is excellent at the read this and summarize what it actually does task. Drop the file in, ask for a plain language summary, then dig in. Two minutes of summary saves twenty minutes of guesswork. The summary also surfaces the assumptions baked into the code, which is often where the bug lives. Reading other code is a skill that compounds, and using Claude Code as a reading partner accelerates the compounding.

Postmortems for the bugs that mattered

Not every bug deserves a postmortem. The ones that reached production and affected users do. Write a short note that covers what happened, when it started, how you found it, what the fix was, and what change would have prevented it. Keep it under a page. The discipline of writing the note is the value. Half the time, the prevention idea you land on is a small lint rule or a single test you should have written months ago, and adding it is fifteen minutes of work that pays for itself the next time something similar would have shipped.

Asking Claude Code the right debugging questions

The quality of a debug session with Claude Code tracks the quality of the questions you ask. What changed since the last working state. Which assumption in this function is no longer true. Which input would make this code path execute. These questions point at causes. Generic questions like fix this bug point at symptoms and produce guesses. Train yourself to ask for the underlying state rather than for a patch. The patch follows once the state is clear, and the fix you get is better because the reasoning is exposed.

Heisenbugs and the timing class

Some bugs disappear when you look at them. Add a log line and the race condition does not happen anymore. Step through with a debugger and the order changes. These bugs are timing dependent and they punish slow methods. The way to catch them is to keep the production conditions and add tracing that does not change timing. Sampling, counters, structured logs to a sink rather than to stdout. Claude Code can help you instrument cleanly, but the discipline is yours. Do not chase a heisenbug with instrumentation that warps the timing you are trying to measure.

Concurrency bugs in particular reward writing a stress test rather than a single reproduction. Loop the trigger a thousand times, let the race fire under load, and you get a real signal instead of an occasional symptom. The stress test stays in your repo as a guard against regression. Future changes that reintroduce the race fail the stress test before they reach production.

Knowing when to walk away

Some bugs do not yield to a focused two hour session. The fix is to walk away. Close the laptop, do something else, come back fresh. The subconscious is good at problems that have been precisely defined and badly stuck. The half hour walk often produces the insight that an hour of staring did not. When you come back, restate the problem to Claude Code in one tight sentence, paste the smallest reproduction, and try again. The fresh framing alone usually shifts something loose.

Type errors and TypeScript specific debugging

TypeScript type errors are a specific class of bug that responds well to Claude Code. Paste the full tsc output or the IDE error, not just the top line. Type errors cascade - a wrong type on one function propagates through every function that calls it, and fixing the cascade from the bottom up is always wrong. Fix the root cause first. Claude Code is good at tracing a type error back to its origin because it can read the full call chain and spot where the contract broke.

When a type error feels confusing, ask Claude Code to explain the expected type and the actual type in plain language before proposing a fix. The explanation often reveals a design problem rather than a type problem - the types were right and a function was being used in a way that was never its intended purpose. Fixing the caller rather than loosening the type is almost always the better outcome. Casting to any to silence the error is technical debt in the shape of a bug waiting to happen.

Network and API debugging

Network bugs between your code and an external API have a specific debugging approach. Always start with the raw request and raw response. Use curl to make the same call your code makes, with the same headers and body. If curl works and your code does not, the bug is in how you are constructing the request. If curl fails too, the bug is in the API call parameters, authentication, or the API itself. Establishing which side the problem is on before you dig is the move that saves an hour of looking in the wrong place.

Replicate the failing API call with curl before modifying any code
Log the full request URL, headers, and body before it is sent so you can compare with what actually went out
Compare status codes carefully - a 422 and a 400 are both client errors but they mean different things
Read the error body from the API response, not just the status code, since the body usually names the parameter that failed
Check rate limits and retry logic before concluding the API is broken - throttled requests often look like errors

Memory leaks and long-running process bugs

Memory leaks in Node processes or browser applications are a class of bug that does not show up in a single reproduction - they accumulate over time. The debugging approach is to add memory profiling to a long run rather than trying to find the leak in a short test. Node's --inspect flag and the Chrome DevTools memory profiler give you heap snapshots you can compare over time. The growing object class in the snapshot is the leak. Claude Code can help write a simple heap dump script and interpret the snapshot output once you have it.

Event listener leaks are the most common memory leak in browser code that uses Claude Code-generated React or Vanilla JS. When a component adds an event listener in a useEffect or a setup function and does not remove it on cleanup, the listener and its closure stay alive after the component unmounts. Ask Claude Code to audit your useEffect hooks for missing cleanup returns. The fix is a one-line return function and the audit takes under a minute on a well-structured component file.

Asking Claude Code for a debug plan first

For truly hard bugs, start the session by asking Claude Code for a debug plan before either of you touches the code. Describe the symptom, the environment, and what you have already tried. Ask it to list five possible causes in order of likelihood. Then work through the list from the top. This framing is better than open-ended where is this bug because it produces a ranked hypothesis set rather than an immediate guess. The first hypothesis is usually not the answer, but working through the list systematically is faster than chasing intuitions.

Closing the loop

When the test passes and the fix is in, take five minutes to write what you learned. Not a long doc. A short note in the PR description or in a personal log. Future you, three months from now, will hit a similar bug and the note is the bridge. Members at claudecodeclub.ai trade these notes and the shared bug folklore is one of the unspoken benefits of the $9 a month, because most production bugs have been hit by someone else first. The /guides/git-workflow-with-claude-code guide pairs well with this method because it makes the test plus fix commit clean and easy to ship.

Common questions

Why does every debug start with a reproduction?
A reproduction gives you a binary signal that tells you whether each change moved you closer or further away. Without one, you are guessing. The first ten minutes nailing down the steps to make the bug appear pays for itself many times over.
Should I describe the error to Claude Code or paste it?
Paste it. Descriptions lose information. The raw error has the noun, the verb, the file, and the line, and that is usually enough to point at the fix without further investigation.
What does narrowing the surface mean in practice?
Cut the input down to the smallest case that still triggers the bug, cut the code path down to the smallest sequence that still fails, and delete imports, middleware, and conditions that are not involved. The smaller the surface, the cheaper each theory is to test.
When is git bisect the right move?
When the bug appeared between two known states. Tell git the last good commit and the first bad commit, run your reproduction at each step, and in about ten minutes you have the exact commit that introduced the regression.
Why write a failing test before the fix?
The test forces you to be precise about what wrong means, and it stays in the suite to prevent the same bug from coming back when the next refactor touches the same code.
How do I handle a bug that turns out to be in a dependency?
Pin the dependency version that works, open an issue upstream if it is not already filed, and add a comment in your code that records why the pin exists. Pinning preserves a known good state while the upstream fix lands.

More guides

Go from reading to shipping

Guides get you oriented. The club gets you shipping. Join Claude Code Club for $9/month.

Join the club See the curriculum

Related: the library, use cases, and the learn hub.