AI-Assisted, Spec-Driven Development: How I Built a SaaS App in 30 Days
615 commits. 47 specs delivered. 350+ C# source files. 159 test files. 152 TypeScript files. A full-stack SaaS app with real-time collaboration, AI-powered analysis, role-based auth, and a six-tier design system. Built by one developer in roughly 30 days.
I built StormBoard, a collaborative event storming tool with an AI analysis engine, using Claude Code as my primary development tool. This isn’t a story about generating a TODO app with a chatbot. It’s about a methodology that turned a solo developer into a full-stack team, and it’s entirely repeatable.
The app is deployed to Azure. It’s Dockerized. It has BDD tests, integration tests, unit tests, and architecture tests. It has cookie-based auth with hierarchical roles, SignalR real-time collaboration, an AI pipeline that routes between Claude models based on task complexity, and a canvas editor with undo/redo, multi-select, and snap-to-grid.
I didn’t write all of this code by hand. I also didn’t just tell an AI to “build me an app.” The methodology sits between those two extremes, and it’s the methodology, not the AI, that made this work.
The three documents
The entire approach distills to three files in the repository root.
SPECIFICATIONS.md is a ~1,900-line document containing 47 feature specs. Every spec follows the same structure: a user story, acceptance criteria, and detailed technical breakdown specifying domain models, database schemas, API endpoints, client components, and test expectations. This is the development interface. It’s what Claude Code reads before generating a single line of code.
This is a living document. I didn’t write all 1,900 lines upfront and hand them off. The spec evolved throughout the project, and Claude helped write it. I’d say “what if we added a collaboration mode where multiple users can edit the same board?” and Claude would draft the spec: the domain events, the SignalR hub contract, the conflict resolution strategy, the test expectations. I’d refine it, argue with parts of it, say “no, let’s use optimistic concurrency instead,” and Claude would update the spec to reflect the decision. The spec was a conversation artifact as much as a planning artifact.
CLAUDE.md is a ~245-line rules file. It encodes my architectural decisions, naming conventions, testing philosophy, and code style. It’s the difference between “reasonable code” and “my code.”
ROADMAP.md is a living progress tracker. Which specs are done, which are next, what order to play them in.
That’s the entire methodology. Think through the feature, often with the AI, and capture it in a spec. Point the AI at the spec. Review the output. Commit.
What a spec actually looks like
Here’s a condensed excerpt from Spec 1, the walking skeleton that established the architecture:
## Spec 1: Walking skeleton
**As a** developer, **I want** the foundational solution structure,
build pipeline, and a single end-to-end slice **so that** we have a
proven architecture to build features on.
### Acceptance Criteria
- Solution builds and all tests pass (green)
- Docker Compose starts SQL Server, API, and Angular with one command
- A board can be created via API with a name
- A board can be retrieved by ID via API
- Health check endpoint returns healthy
### Technical Details
- **Domain**: `Board` (aggregate root), `BoardId` (typed ID),
`Create(string name)` / `Hydrate()` factory methods
- **SqlServer**: `BoardData`, `StormBoardDbContext` (stormboard schema),
`BoardRepository`, mapping extensions (`.ToDomain()`, `.ToInfo()`)
- **WebApi**: `POST /api/boards` (201 + id),
`GET /api/boards/{id}` (200 + BoardInfo)
- **Tests**: 3 handler unit tests, 1 repository integration test,
1 endpoint integration test, 1 architecture test, 1 BDD scenario
Notice the level of detail. Every layer is specified. File paths, method names, HTTP status codes, even the exact test count. When Claude Code reads this, it doesn’t have to guess what I want. It knows the domain model, the persistence layer, the API surface, and the test expectations.
Without this spec, the AI produces generic code. With it, the AI produces my code. In my architecture, following my patterns, testing at the layers I care about.
The rules file
CLAUDE.md is a team culture document for an AI teammate. It encodes the decisions that are hard to infer from context. The things a new developer on your team would get wrong on their first PR.
Here’s the handler testing section:
### Handler unit tests (Domain.Tests)
- Mock repositories and `IUnitOfWork` — no real database
- Three standard tests per handler:
1. **Success path** — loads entity, performs domain operation,
persists via repository, saves via UnitOfWork
2. **Not found** — returns null from repository,
throws `EntityNotFoundException`
3. **CancellationToken propagation** — verifies token is
passed to all async calls
- Test naming: `HandleAsync_[scenario]_[expected_result]`
And the development philosophy:
### TDD is mandatory
- **Always write the failing test first**, then make it pass,
then refactor
- Never write production code without a corresponding test
- When asked to implement a feature, start by writing the test —
not the implementation
And one of my favorite micro-rules:
- **NO `// Arrange`, `// Act`, `// Assert` comments** — they are clutter.
Well-structured tests with clear variable names and whitespace
separation are sufficient
That last one might seem trivial. But without it, Claude Code adds those comments to every single test method. It’s the kind of reasonable-but-wrong choice that an AI will make a hundred times unless you tell it not to. The rules file is where you tell it not to.
Other rules that matter: “No MediatR, use assembly-scanned handlers.” “Info records are sealed records with required properties, and they live in Domain.” “Factory methods on aggregates: Create() generates a new ID, Hydrate() reconstitutes from persistence.” “Use Conventional Commits format.”
Every one of these prevents a class of PR corrections. The rules file pays for itself within the first three specs.
Why specs first
The specification is not documentation. It’s the development interface.
When you write a spec with acceptance criteria and technical details down to the file path level, you’re doing two things at once. First, you’re forcing yourself to think through the feature before any code exists: which layers are involved, what the API contract looks like, what edge cases matter. Second, you’re creating a machine-readable description of the feature that an AI can execute against.
This is the core insight: the spec is the prompt. The better the spec, the better the output. A vague spec produces vague code. A spec with acceptance criteria, domain models, endpoint definitions, and test expectations produces a working vertical slice through every layer of the application.
I learned this the hard way. Early on, I tried giving Claude Code loose instructions like “add sticky notes to boards” and spent more time correcting the output than I would have spent writing the code myself. Once I started writing full specs with technical details, the corrections dropped to near-zero.
But here’s what I didn’t expect: Claude is also good at writing the specs themselves. Once the project had established patterns, I could describe a feature at a high level (“I want users to be able to duplicate a board”) and Claude would draft a complete spec following the established structure, referencing the right domain patterns, proposing the right test expectations. I’d read it, adjust the parts where my vision diverged from its proposal, and the spec was done in minutes instead of an hour. The AI became a collaborator in the design phase, not just the implementation phase.
The development loop
A typical day looks like this:
- Think through the next feature, often in conversation with Claude: “I’m thinking about adding board templates. What would the domain model look like?” It proposes, I push back, we iterate
- Capture the result in SPECIFICATIONS.md. Claude drafts the spec, I edit it until it’s right
- Tell Claude Code: “Implement Spec 12”
- Watch it read the spec, read the rules file, read the existing codebase, then generate domain models, persistence layer, API endpoints, client components, and tests
- Review everything it produced. Check the domain logic, verify the test coverage, make sure it followed the patterns
- Commit frequently with conventional commit messages
- Update ROADMAP.md
- Pick the next spec
Steps 1, 2, and 5 are where the developer adds value. The design conversation and spec refinement are the thinking. The review is the quality gate. Everything in between is execution, and the AI handles execution at a pace that would be impossible manually.
The key constraint is that I commit frequently. Small, conventional commits mean I can revert any individual change without losing a day of work. It also means the git history reads like a development journal.
What the AI is good at
Vertical slices. Tell it to implement a feature and it generates code across every layer: domain entity, persistence model with EF mapping, repository, command handler, API endpoint, Angular service, component, and tests. This is where the time savings are largest. A feature that would take a day of boilerplate wiring takes minutes.
Consistency at scale. By spec 30, the codebase has 350+ source files. A human developer would start making inconsistent choices. Slightly different naming in one handler, a different test structure in another. The AI reads the rules file every time. It reads the existing codebase every time. It produces consistent code across hundreds of files because it never forgets the conventions.
TDD when given clear rules. The rules file says “write the failing test first.” So it writes the failing test first. Every handler gets three tests. Every domain entity gets guard clause tests. Every BDD scenario exercises the full HTTP round-trip. It follows the testing strategy because the strategy is written down.
Infrastructure boilerplate. Docker configuration, EF migrations, CI/CD pipelines, middleware setup. The AI produces correct infrastructure code because these patterns are well-established in its training data.
What the AI is bad at
Novel architectural decisions. When the spec doesn’t prescribe the approach, Claude Code proposes the safe, common choice. It would have suggested MediatR instead of assembly-scanned handlers. It would have used controllers instead of Minimal APIs. It would have put read models in the WebApi project instead of Domain. The architectural vision has to come from you.
Visual design and UX polish. It can generate Angular components and apply CSS tokens, but it doesn’t have an eye for spacing, hierarchy, or interaction feel. The six tiers of UI polish in StormBoard required manual iteration.
Knowing when to stop. Left unconstrained, it over-engineers. It adds interfaces you don’t need. It creates abstractions for one-time operations. It adds configuration options for things that should be hardcoded. The rules file and the spec’s scope boundaries are the guardrails.
Unstated preferences. If you care about something but haven’t written it down, the AI will not infer it. That’s why the rules file exists. Every time I corrected the AI on something (a naming convention, a testing pattern, a structural preference) I added the rule. The rules file grew organically from corrections.
The compound effect
This is the part that surprised me.
Each spec builds on the previous specs. By Spec 5, the codebase has established patterns for every layer. By Spec 10, the AI has enough context from reading the existing code that new features land with fewer corrections. By Spec 20, adding a new feature with full test coverage is nearly automatic. The AI reads the existing handlers, repositories, and tests, and produces new ones that are structurally identical.
The rules file and specs compound. The investment in writing detailed specs and precise rules pays increasing dividends as the project grows. Early specs take longer to write because you’re establishing patterns. Late specs are shorter because they can reference established conventions: “follow the same pattern as Spec 2.”
The spec itself compounds too. Because it’s a living document, every completed spec becomes context for the next one. When I’d say “I have an idea, what if the AI analysis could suggest missing domain events?” Claude would read the existing specs, understand the established patterns, and draft a new spec that fit into the existing architecture. The spec grew from a plan into a shared understanding of the system.
The ROADMAP.md closes the loop. When you can see that 40 of 47 specs are complete, the momentum is self-reinforcing. The methodology creates its own flywheel.
Results
The numbers tell the story:
- 615 commits over ~30 days
- 47 specs designed, implemented, and tested
- 350+ C# source files across 5 backend projects
- 159 test files across 4 test projects
- 152 TypeScript files in the Angular client
- 4 test projects: unit, integration, BDD, and architecture tests
- Full CI/CD: GitHub Actions deploying to Azure App Service
- Fully Dockerized:
docker compose upruns everything - Real-time collaboration via SignalR
- AI analysis engine with model routing, prompt caching, and structured I/O
- Role-based auth with cookie sessions and hierarchical policies
- Six tiers of UI polish including dark mode and canvas themes
One developer. One AI. Three documents.
The methodology is the product
This isn’t about replacing developers. The AI didn’t make architectural decisions. It didn’t design the domain model. It didn’t decide that event storming stickies should be structured data feeding an AI analysis pipeline. It didn’t write the specs.
What the AI did was eliminate the gap between design and implementation. Once a feature was thought through, once the spec existed, turning that spec into working, tested, deployed code became nearly mechanical. The developer’s job shifted from “write code” to “design features and review code.”
The spec-driven approach works because it forces you to think before you build. The rules file works because it captures your decisions in a format the AI can follow. The combination gives the AI enough context to build what you actually meant, not what it guesses you meant.
And don’t think of the spec as a document you write once and hand off. Think of it as a living conversation with your AI collaborator. You have ideas, you talk them through, the spec gets updated. You discover edge cases during implementation, the spec gets updated. You change your mind about an approach, the spec gets updated. The document is always the source of truth, and keeping it current is part of the workflow, not an afterthought.
If you’re considering using AI tools for serious development (not toy projects, not prototypes, but production software) start with the specs. Write them in detail. Codify your architectural decisions in a rules file. Commit frequently. Review everything.
The methodology is repeatable. The AI is the multiplier.
Found this useful?
If this post helped you, consider buying me a coffee.
Comments