Using AI to Refactor Large Codebases

I just finished refactoring a 50K-line TypeScript monolith into a cleaner architecture using Claude Code. It took three weeks. Without AI, my estimate was three months. Here's exactly how I did it, including the mistakes that cost me time.

The Starting Point

The codebase was a Next.js application that had grown organically over two years. Business logic was scattered across API routes, components had database queries in them, and the same utility functions were duplicated in six different files. Classic "move fast and break things" debt.

The goal: extract business logic into a proper service layer, consolidate duplicate utilities, add proper TypeScript types where there were any types, and get test coverage above 60%.

The Strategy: Layers, Not Files

My first instinct was to go file by file. That was wrong. When you refactor file by file with an AI, you lose the big picture. Each file change looks reasonable in isolation, but the overall architecture drifts because the AI doesn't have a consistent target to aim at.

Instead, I worked in layers:

Week 1: Extract types. Go through every any type and every inline interface, create proper type definitions in a types/ directory. This gave Claude (and me) a vocabulary for the rest of the refactoring.
Week 2: Extract services. Pull business logic out of API routes and components into service modules. Keep the same behavior, just move where the code lives.
Week 3: Consolidate and test. Remove duplicates, clean up imports, write tests for the new service layer.

The Prompts That Worked

For type extraction, this prompt was my workhorse:

Read [file]. Find all inline type definitions, `any` types,
and untyped function parameters. Create proper TypeScript
interfaces in types/[module].ts. Update the source file
to import and use these types. Don't change any logic.

The "don't change any logic" constraint is critical. Without it, Claude would "helpfully" refactor the business logic while extracting types, making the diff impossible to review.

For service extraction:

Read [api-route-file]. Extract all business logic into
a new service at services/[name].ts. The API route should
only handle request parsing, calling the service, and
formatting the response. Keep the exact same behavior.
Write a test for the service that covers the main path.

The Pitfalls

Pitfall 1: Too many files at once. I tried giving Claude a batch of 10 related files to refactor simultaneously. It produced changes that were internally consistent but introduced subtle bugs at the boundaries between files. I scaled back to 2-3 related files per session. More than that and the quality drops.

Pitfall 2: Skipping the review. By week two, I was so confident in the pattern that I started approving changes without reading every line. A silent behavior change slipped through, a default parameter value was different in the extracted service. It took two hours to track down when the bug showed up in staging. After that, I reviewed every diff, no exceptions.

Pitfall 3: Forgetting about imports. Claude is good at updating imports within the files it's touching, but sometimes misses files it hasn't read. After every refactoring session, I ran tsc --noEmit to catch broken imports immediately.

Pitfall 4: Over-abstracting. Claude loves to create abstractions. "This could be a generic utility" it would suggest, when the function was only used in one place. I had to add "Do not create abstractions unless the pattern appears in at least three places" to my CLAUDE.md.

The Numbers

50,247 lines before, 41,832 lines after (8,415 lines of dead code and duplication removed)
147 any types eliminated
23 service modules extracted from 67 API route files
Test coverage went from 12% to 68%
Three weeks of my time, estimated at roughly 40 hours of active work

Would I Do It Again?

Absolutely. The layer-by-layer approach with AI is genuinely 3-4x faster than doing it manually. But it requires discipline. You need a clear plan before you start, you need to review every change, and you need to run your type checker and test suite after every session.

The AI doesn't have taste for architecture. It has speed for mechanical transformation. Your job is to provide the taste and use the AI for the transformation. That division of labor is what makes it work at scale.