Beyond the Prompt: How to Manage Multi-Week LLM Projects Like a Senior Engineer

Last week, I worked on a single-tenant system to migrate it to a multi-tenant system using Claude Code as my primary development partner. The project has spanned multiple weeks, involves fundamental architectural changes across the entire stack, and would have traditionally required a team of 4-5 engineers. Instead, it is just me and an AI agent—but not in the way you might think.
The most common question I get from engineers experimenting with AI-assisted development is: "How do you maintain coherence on longer projects? My LLM conversations turn into spaghetti after a few hours."
The answer isn't just better prompts—though they're essential. It's better project management on top of good prompting.
The Junior Engineer Mental Model
In my previous posts, I've advocated for treating LLMs as highly capable junior engineers. This isn't just a cute analogy—it's an operational framework that fundamentally changes how you approach AI-assisted development.
Think about onboarding a human junior engineer. You wouldn't dump a complex multi-week project on their desk with a single set of instructions and disappear. You'd provide regular check-ins, design reviews, course corrections, and most importantly, you'd structure the work to set them up for success.
Yet this is exactly what most developers do with LLMs—rely solely on increasingly refined prompts as the project grows more complex, wondering why even well-crafted prompts can't maintain consistency over time. The issue isn't the prompt quality or the LLM's capability; it's the lack of engineering management.
The Secret Sauce: Design Reviews for Robots
Here's what most people miss: ambiguity compounds exponentially in long-running LLM projects. A slightly vague requirement in hour 1 becomes a fundamental misunderstanding by hour 20. By hour 40, you're so far off track that starting over seems easier than correcting course.
The solution? Treat your LLM like you'd treat any junior engineer: institute regular design reviews.
Let me walk you through a concrete example from last week's multi-tenant migration project.
Case Study: The Multi-Tenant Migration
The project seemed straightforward enough: convert a single-tenant web application to support multiple tenants. But as any engineer who's done this knows, it touches everything—data model, authentication, infrastructure, deployment, even seemingly unrelated business logic.
Instead of diving straight into code, I started with what I call a "design review artifact." Here's exactly how it played out:
Hour 1-2: Current State Documentation I asked Claude Code to document the existing PostgreSQL schema in a simple dbschema-postgresql.md
file. Not fancy, not structured—just plain, human-editable markdown. This forced both of us (yes, I think of it as "us") to understand the current state thoroughly.
Hour 3-4: Future State Design With the current state documented, I provided the business context for multi-tenancy and asked for a proposed DynamoDB schema in dbschema-dynamodb.md
. Note the technology shift—we weren't just adding tenant IDs; we were rethinking the entire data layer.
Hour 5-6: The Real Design Review This is where the magic happened. For the next 90 minutes, we iterated on that markdown file. Not code. Not Terraform scripts. Just a simple document describing tables, partition keys, sort keys, and GSIs. Every design decision was debated, documented, and refined in this single artifact.
By the end of those 90 minutes, I had absolute clarity on the data model. More importantly, so did Claude Code. Every subsequent conversation about data access patterns could reference this canonical document.
The Minimal-Complete Principle
This experience crystallized something I now call the "minimal-complete" principle. When working with LLMs on complex projects, your artifacts should be:
- Minimal: Contain only the essential information for the current focus area
- Complete: Fully capture all necessary context for that specific aspect
This isn't new wisdom—it's why we've always separated concerns in software design. But it becomes critical when your "junior engineer" can't maintain context across sprawling conversations.
Consider the alternatives I could have chosen:
- Jump straight into modifying application code (too much context, too many concerns)
- Write Terraform scripts for DynamoDB tables (implementation details obscuring design decisions)
- Create a 50-page design document (too much information, hard to iterate)
Instead, a simple markdown file with table schemas hit the sweet spot. It was minimal enough to iterate quickly but complete enough to capture all crucial decisions.
Scaling Beyond Single Sessions
The real test came when I returned to the project days later. Traditional LLM conversations would require extensive context rebuilding—"Remember when we discussed..." or "As we decided earlier..."
But with our design review artifacts, reestablishing context was trivial. I could start a fresh conversation with: "Here's our agreed DynamoDB schema (paste dbschema-dynamodb.md
). Now let's implement the data access layer."
The LLM immediately understood not just what to build, but why we made specific design choices. This is the difference between managing a project and hoping for coherent output.
The Toolkit That Works
Through this project and others, I've found certain artifacts consistently enable productive long-term LLM collaboration:
- Design documents: For architectural decisions
- Schema definitions: For data models
- API contracts: For service boundaries
- Test specifications: For behavior definition
- Flow diagrams: For complex processes (described in text)
Notice what these have in common? They're the same artifacts experienced engineers use to collaborate with humans. The formats that have evolved over decades for human communication turn out to be exactly what we need for human-AI collaboration.
Claude Code: A Brief Love Letter
I mentioned using Claude Code almost exclusively for this project, and it deserves special recognition. During the week, I even experienced the platform's outage (which sent me briefly exploring OpenAI's Codex before Claude came back online).
What makes Claude Code special isn't just its coding ability—it's how naturally it fits into this design review workflow. It understands when you're iterating on design versus implementation. It maintains context across artifacts. Most importantly, it responds to engineering management the way a thoughtful junior engineer would.
The Path Forward
As we push the boundaries of what's possible with AI-assisted development, the bottleneck isn't the AI's capability—it's our ability to manage AI effectively. The engineers who will thrive in this new world aren't just those who write excellent prompts—that's table stakes—they're those who combine great prompting with sound engineering management principles for their AI collaborators.
The next time you embark on a multi-week project with an LLM, resist the urge to dive straight into code. Instead, ask yourself: "How would I structure this work for a talented but inexperienced human engineer?"
Create design review artifacts. Separate concerns ruthlessly. Iterate on minimal-complete documents. Treat your AI like the junior engineer it is—one who never gets tired, never forgets what's written down, and always shows up ready to work.
The future of software development isn't just about AI that can code. It's about engineers who can lead—whether their team members are human or artificial. And in my experience, the principles remain remarkably constant across both.