← Back to writing

Spec-Driven Development on a Team: Scaling AI-Assisted Workflows Beyond Solo

aiclaude-codesoftware-architecturedeveloper-productivityteams

The methodology works for one developer. The question is whether it works for five. Or twenty. Or a hundred.

I recently wrote about building a full-stack SaaS app in 30 days using a spec-driven, AI-assisted workflow. Three documents (a specifications file, a rules file, and a roadmap) turned a solo developer into a full-stack team. 615 commits. 47 specs. A production application with real-time collaboration, AI analysis, role-based auth, and test coverage.

The response was a variation on a single question: does this scale to a team?

I haven’t done it yet. But I’ve spent enough time inside this methodology, and enough years on software teams, to have opinions about what would work, what would break, and what would need to change. This post is prescriptive, not retrospective. It’s what I’d do if I were adopting this approach with a team tomorrow.

The solo workflow, briefly

The solo methodology distills to a loop:

  1. Think through the feature, often in conversation with the AI
  2. Capture the design in a detailed spec
  3. Point the AI at the spec
  4. Review the output
  5. Commit

Three documents drive the loop. SPECIFICATIONS.md contains feature specs with user stories, acceptance criteria, and technical details down to the file path level. CLAUDE.md encodes architectural decisions, naming conventions, and testing philosophy. ROADMAP.md tracks progress.

The spec is the development interface. The rules file is the team culture document. The roadmap closes the feedback loop.

Everything that follows is about what happens to these three artifacts, and the workflow around them, when more than one person is involved.

The spec becomes a design review surface

On a solo project, the spec is a conversation between the developer and the AI. On a team, it becomes the design review artifact.

Most teams review design after implementation. A developer builds a feature, opens a pull request, and the team sees the design for the first time embedded in the code. By that point, changing the design means rewriting the code. The review becomes a negotiation between “this should have been done differently” and “it’s already done.”

Specs invert this. The team reviews the design before any code exists. The spec describes the domain model, the API surface, the persistence layer, the test expectations. Disagreements surface when they’re cheap to resolve. Before anyone has written a line of code.

This isn’t a new idea. Design documents and RFCs have existed forever. What’s different is that the spec isn’t just a communication artifact. It’s an execution artifact. The same document that the team reviews is the document the AI reads to generate code. The design and the implementation are coupled through the spec, not through a developer’s interpretation of a design doc.

Pull request reviews don’t disappear. They shift in purpose. Instead of “should we have done this differently?” the question becomes “did the implementation match the spec?” That’s a faster, less contentious review. The philosophical debates happened during spec review. The PR review is mechanical verification.

The rules file becomes a team constitution

Solo, the rules file encodes one developer’s preferences. On a team, it encodes team consensus. This changes its nature entirely.

The rules file forces teams to make decisions they typically defer. What’s the testing philosophy? What’s the naming convention for read models? Do we use MediatR or direct handler injection? What does a standard handler test look like? How do we structure our persistence layer?

In most teams, these decisions happen implicitly. Whoever writes the first implementation sets the pattern. Others copy it, sometimes inconsistently. Over time, the codebase accumulates multiple patterns for the same concern: three ways to structure a test, two naming conventions for DTOs, an ongoing debate about whether repositories return domain objects or data objects.

The rules file makes these decisions explicit and durable. More importantly, it makes them enforceable. Not by a linter or a code reviewer, but by every AI agent that touches the codebase. When five developers are each working with AI assistants, and all five AI agents read the same rules file, the consistency effect compounds. The rules file becomes the single strongest force for codebase coherence.

This connects to a principle I explored in my post on ubiquitous language: the words you use in conversation should be the words you use in code. The rules file extends that principle. The conventions you agree on in team discussions should be the conventions encoded in the rules file, and from there, enforced in every AI-generated line of code.

But this also introduces a new challenge: governance. Solo, I add a rule when the AI does something I don’t like. On a team, adding a rule is a team decision. “No // Arrange, Act, Assert comments” is my preference. Another developer might disagree. The rules file needs the same review process as code. Changes to it are architectural decisions: discussed, agreed upon, and versioned through pull requests.

Spec authorship as a design skill

This is where the methodology reshapes how a team works.

Writing a good spec requires understanding the domain, the existing codebase, the established patterns, and the boundaries of the feature. It requires knowing what to specify and what to leave to convention. It’s closer to technical architecture than to requirements writing.

On a team, spec authorship and spec implementation emphasize different strengths. Writing a spec exercises design thinking: domain modeling, API contract design, understanding cross-cutting concerns. Implementing a spec exercises execution and review: reading the AI’s output critically, verifying correctness, catching deviations from the intent.

In practice, developers who are deeper in the domain or the architecture will gravitate toward authoring specs, while developers who are newer to the codebase or the domain will start by implementing them. But this isn’t a fixed division. It’s a growth path. A developer who implements ten specs develops the context and judgment to author the eleventh. Spec authorship is a skill the team cultivates, not a role it assigns.

The risk is real: if only one or two people can write good specs, every feature waits for their design capacity. The team’s throughput is gated by design, not implementation. I’d argue this is the right bottleneck. Most teams are gated by implementation and ship poorly designed features fast. Being gated by design means the code that does ship is well-thought-through. But it’s still a constraint, which is why growing spec authorship across the team matters more than concentrating it.

What changes, concretely

Specs split by boundary

A single SPECIFICATIONS.md doesn’t scale beyond one developer. On a team, specs should be organized by bounded context or feature area. One spec file per context, owned by the team that owns that context.

This mirrors good domain design. If the ordering context and the fulfillment context have separate models, separate language, and separate ownership, they should have separate specs. A spec for a fulfillment feature shouldn’t require reading the ordering spec to understand. The boundaries in the specs should match the boundaries in the code.

The roadmap becomes a team board

Solo, ROADMAP.md is a personal progress tracker. On a team, it absorbs much of what teams currently use project management tools for. The specs define what’s designed. The roadmap tracks what’s in progress, what’s done, and what’s next. If each spec is well-scoped (one vertical slice, one aggregate, one feature) the roadmap becomes a kanban board in markdown.

This won’t fully replace project management tooling, but it reduces the gap between “what we planned” and “what the code reflects.” The roadmap lives in the repo, next to the specs and the code. It’s versioned. It’s always current because updating it is part of the development loop.

Weekly spec reviews

The team gathers to review upcoming specs the way they currently review pull requests. Spec authors present the design. The team challenges assumptions, catches conflicts with other in-flight specs, and approves the design before implementation begins.

This is where the collaborative discovery techniques from event storming feed directly into the workflow. A discovery session produces domain events, commands, actors, and policies. Those artifacts become the raw material for specs. The workshop is the divergent thinking. The spec is the convergent result.

Implementation as human-AI pairing

The implementation phase isn’t two humans pairing. It’s a human and an AI. The developer reads the spec, instructs the AI, reviews the output. The review is the learning moment.

This is where the compound effect from the solo workflow carries over. By the tenth spec, the AI has enough codebase context that new features land with fewer corrections. A new team member pointing an AI at Spec 15 will produce code that matches the patterns established in Specs 1 through 14, because the AI reads the existing code, not just the spec and rules file. The codebase teaches the AI, and the AI helps new developers produce code that matches the team’s standards from day one.

What happens to pair programming

Traditional pair programming puts two developers at one keyboard. One drives, one navigates. The value comes from real-time design discussion, catching mistakes early, and knowledge transfer.

In a spec-driven AI workflow, the shape of pairing changes, but it doesn’t disappear. It splits into two distinct modes.

Spec-level pairing is two developers collaborating on a spec before implementation begins. This is the higher-value form. Two people discussing the domain model, debating API contract design, catching edge cases in acceptance criteria. The output is a better spec, which produces better AI-generated code downstream. This is where the design conversations happen. The ones that used to happen at the keyboard during traditional pairing. Moving them upstream into spec authorship means the design gets more attention, not less.

Implementation-level pairing is two developers working together during AI-assisted implementation. This looks different from traditional pairing because the “driving” is mostly prompting and reviewing. But the navigator role becomes more important, not less. The navigator reads the AI’s output with fresh eyes. They catch the pattern deviations that the driver, who’s been staring at the spec for an hour, might miss. They ask “why did the AI structure the test this way?” which forces both developers to engage with the output critically.

There’s also a third mode that didn’t exist before: pairing on AI output review. Two developers reviewing what the AI produced, comparing it against the spec and the rules file, discussing whether the implementation captures the intent. This is useful for complex specs where the AI made judgment calls that the spec didn’t prescribe.

The common thread is that pairing shifts from “two people writing code” to “two people thinking about design and reviewing output.” The mechanical act of typing was never the valuable part of pair programming. The conversation was. That conversation still happens. It just happens at different points in the workflow.

Benefits that don’t exist solo

Onboarding acceleration

A new developer joining a team with this methodology reads three things: the rules file, the recent specs, and the roadmap. They understand the architecture, the conventions, the testing philosophy, and what’s been built. Then they pick up a spec and start implementing with AI assistance.

The traditional onboarding path (read the wiki, shadow a senior developer, make mistakes on your first three PRs) is replaced by a structured ramp. The new developer’s first PR will be consistent with the rest of the codebase because the AI enforced consistency. The review is focused on whether the developer understood the spec, not whether they guessed the right patterns.

Parallel implementation

If specs are well-scoped (one aggregate, one vertical slice) multiple developers can implement different specs simultaneously with minimal merge conflicts. Spec 12 touches the Board aggregate. Spec 13 touches the Organization aggregate. They live in different bounded contexts, different directories, different test projects. Two developers implement them in parallel without coordination.

This is where good domain design pays off twice. Well-defined aggregate boundaries and bounded contexts make parallelism possible at the code level. Well-scoped specs make parallelism possible at the workflow level. The two reinforce each other.

Knowledge durability

When a developer leaves a team, their knowledge leaves with them. The codebase remains, but the reasoning behind the code is lost.

Specs preserve reasoning. “Why does the analysis engine route between models based on complexity?” Because Spec 34 says so, and the spec explains the design rationale. “Why do we use assembly-scanned handlers instead of MediatR?” Because the rules file says so, and the PR that added that rule has the discussion.

The spec file becomes institutional memory. Not as a static archive, but as a living document that explains not just what the system does but why it does it that way.

Tensions that don’t exist solo

Spec maintenance coordination

Solo, updating a spec is trivial. You change it and move on. On a team, a spec change during implementation has ripple effects. If developer A discovers during implementation that Spec 12 needs a modified API contract, that change might affect developer B working on Spec 14, which calls that endpoint.

Spec changes during implementation need a lightweight review process. Not a full spec review. More like a design amendment. “The spec said the endpoint returns 201 with an ID. During implementation, I discovered we also need to return the created timestamp. Here’s the updated spec.” Quick approval, move on.

The alternative, treating specs as immutable once approved, is worse. It creates the same rigidity that makes waterfall fail. The spec is a living document. The team needs a process for updating it that’s fast enough to not block implementation but visible enough to catch conflicts.

Prompt variation and management

Each developer phrases instructions differently. One developer says “implement Spec 12.” Another says “implement Spec 12, and make sure the error handling follows the pattern from Spec 5.” A third pastes the spec content directly into the prompt.

The rules file constrains the AI’s output regardless of the prompt, but it can’t eliminate all variation. Teams adopting this methodology benefit from lightweight prompt conventions: how to reference specs, what context to provide, when to let the AI read the codebase versus providing explicit guidance.

This is an area where tooling is starting to emerge. Platforms like Zatomic focus on prompt management for teams: versioning prompts, sharing effective patterns, and standardizing how teams interact with AI tools. As spec-driven workflows become more common, the prompt layer becomes its own concern that benefits from the same discipline we apply to code: version control, review, and iteration.

Even without dedicated tooling, teams can start simple. A PROMPTS.md file in the repo with examples of effective spec implementation prompts. A convention for how to reference spec numbers. A shared understanding of when to give the AI broad latitude versus tight instruction. The spec and rules file handle 90% of consistency. Prompt discipline handles the remaining 10%.

Shallow understanding

This is the tension I worry about most.

In the solo workflow, I wrote: “Steps 1, 2, and 5 are where the developer adds value. The design conversation and the review are the thinking.” On a team, the implementor might shortcut the thinking. They run the spec through the AI, skim the review, commit. The code is correct (the spec was good) but the developer didn’t understand why the code looks the way it does.

This creates knowledge fragility. The team has a working codebase and specs, but the developers’ understanding is shallow. When something breaks in a way the spec didn’t anticipate, the developer who implemented it can’t reason about it from first principles because they never engaged with the design.

The mitigation is cultural, not technical. Spec implementation isn’t “hand the spec to the AI and commit the output.” It’s “read the spec, understand the design decisions, instruct the AI, and review the output critically.” The review is the learning moment. A developer who can’t explain why the AI produced a particular test structure hasn’t done their job, even if the test is correct.

This is the same discipline that separates good code review from rubber-stamping. The tool changed. The discipline didn’t.

The compound effect, amplified

The compound effect I described in the solo post (each spec building on previous specs, the investment in rules and conventions paying off over time) is amplified on a team.

Solo, the compound effect is linear. One developer produces consistent code faster over time.

On a team, it’s multiplicative. Five developers, each benefiting from the same compound effect, each producing consistent code that reinforces the patterns for everyone else. A new convention added to the rules file after Spec 10 improves the output of every developer’s AI on every subsequent spec. A well-designed spec becomes a template for future specs in the same bounded context. The flywheel that worked for one developer works for five, with the added benefit that five developers generate more codebase context for the AI to learn from.

The rules file compounds. The specs compound. The codebase compounds. And unlike tribal knowledge, which degrades as the team changes, these artifacts are durable. They’re in the repo, versioned, and always current.

Who this is for

This methodology isn’t for every team. It’s most effective when:

  • The domain is complex enough to benefit from explicit modeling
  • The team values design discipline over speed of first implementation
  • The codebase will live long enough for the investment in specs and rules to pay off
  • The team is willing to shift design review upstream, from PRs to specs

It’s least effective when:

  • The work is primarily CRUD with minimal business logic
  • The team is experimenting rapidly and the design is expected to change weekly
  • There’s no one on the team with the architectural judgment to author good specs
  • The team culture resists explicit conventions (some teams prefer emergent patterns)

Start small

If this resonates, don’t reorganize your entire workflow on Monday. Start with one experiment.

Pick a feature that’s well-understood but hasn’t been built yet. Write a spec for it: user story, acceptance criteria, technical details. Write or extend a rules file with your team’s conventions. Have one developer implement the spec with AI assistance. Review the output as a team.

Then ask: was the design review of the spec more valuable than a typical PR review? Was the implementation faster? Was the code more consistent? Did the developer who implemented it understand what they built?

If the answer is yes, write the next spec. Add the rule that the AI got wrong. Update the roadmap. The methodology builds on itself. That’s the whole point.


The AI is the multiplier. The spec is the interface. On a team, the rules file is the culture. The question isn’t whether AI-assisted development works. It’s whether your team has the design discipline to direct it. The spec is how you prove that you do.

Found this useful?

If this post helped you, consider buying me a coffee.

Buy me a coffee

Comments