
06 — The architecture doc
When to read this: Before you write feature code, if your project has state, integrations, or cross-cutting concerns. If it's a static site, simple CLI, or library, you can probably skip this chapter (and skip writing ARCHITECTURE.md).
When you need ARCHITECTURE.md#
The PRD captures what you're building and why. ARCHITECTURE captures how it's built at a level specific enough that someone reading it could make compatible decisions on a feature you haven't shipped yet.
You need ARCHITECTURE.md if your project has any of:
- A database (any flavor)
- Auth (any kind, including "magic links" or OAuth)
- Payments
- Multi-tenancy or organization-scoped data
- Background jobs or queues
- Caching
- Real-time / websockets / live multiplayer
- Multiple cross-cutting concerns that have to behave consistently across features
You can probably skip it if your project is:
- A static site or content site (no backend state).
- A simple CLI tool with no persistence.
- A library or SDK.
- A throwaway prototype.
When in doubt, write a short one. A 5-page ARCHITECTURE.md beats no ARCHITECTURE.md once your project has any state.
What ARCHITECTURE.md is for#
ARCHITECTURE.md exists for two reasons:
- To prevent re-litigation of decisions. Without a written-down stack and schema, every feature spec re-asks "what database again? what auth library? what session model?" The agent hallucinates whatever feels right and you get five inconsistent implementations.
- To anchor the reviewers.
architecture-reviewer,migration-reviewer, andapi-reviewerall require ARCHITECTURE.md to do their job. Without it, they fall back to code-only sanity checks. With it, they catch drift specifically: "this migration violates §4's expand-contract policy" instead of "this migration looks risky."
The doc has six sections:
| § | Section | What it pins down |
|---|---|---|
| 1 | Stack | Specific named choices for language, framework, hosting, database, ORM, auth, payments, storage, email, queues, search, observability |
| 2 | Data model | Entities, relationships, identity strategy, multi-tenancy, soft-delete policy, audit, time fields |
| 3 | Service shape | Topology, module boundaries, API style, conventions |
| 4 | Cross-cutting concerns | Auth enforcement layer, error taxonomy, logging, caching, queues, secrets, testing strategy, migration & deploy strategy |
| 5 | Evolution / bets | Reversibility map, falsifiable bets ("we're betting users won't exceed 10k rows per tenant"), deferred decisions, known wrong choices |
| 6 | Trade-off log | Append-only log of every architectural decision: date, decision, alternatives, reason chosen |
The single highest-leverage section is §6, the Trade-off log. Every architectural decision the project makes lands there. Future-you (or future agents) reads it to understand why the architecture is shaped the way it is.
Specificity, again#
Like the PRD, ARCHITECTURE.md only constrains if it's specific. Examples of vague vs specific:
❌ "Postgres."
Which version? Which host (managed Neon, Supabase, RDS, self-hosted)? Connection pooler? Branching strategy?
✅ "Postgres 16 on Neon. PgBouncer connection pooling via Neon's transaction-mode pooler. Branch-per-PR for preview deploys."
❌ "Stripe for payments."
Which integration? Checkout (hosted), Elements (embedded), or Connect (multi-party)? Where do webhooks land? What's the customer model?
✅ "Stripe Checkout (hosted) for the customer-facing payment flow. Webhooks at /api/webhooks/stripe, signature-verified before parsing. Customer model: one Stripe customer per workspace, attached to workspaces.stripe_customer_id."
❌ "REST API."
Resource naming convention? Auth scheme? Error envelope? Pagination style? Versioning?
✅ "REST under /api/v1/. Auth via session cookie validated in middleware. Error envelope: { error: { code, message, details? } }. Cursor-based pagination: { items, nextCursor }. No versioning beyond /v1 until we have an external API."
The pattern: name the choice, name the host or library or version, name the convention. "REST" alone doesn't constrain anything. "REST under /api/v1/, cursor pagination, this error envelope" does.
How architecture-md-builder works#
architecture-md-builder is a skill, not a slash command:
architecture-md-builderSame shape as prd-grill: one question at a time, recommended answer with each, push past vague responses.
Phase 0: diagnose#
The skill first checks:
- Does
docs/PRD.mdexist and is it filled in? If not, the skill stops. Architecture without product context produces wrong choices. - Does
docs/ARCHITECTURE.mdalready exist? If yes, is it specific or vague? The skill only re-asks the vague sections. - Is there existing code?
package.json, lockfiles, top-level folders are constraints, not blank-slate decisions. They get captured verbatim.
Phases 1 through 6: stack, data model, service shape, cross-cutting, evolution, trade-off log#
Each phase corresponds to a section of the doc. The skill asks for named, specific answers. "TBD" is acceptable. It gets logged in the Trade-off log with a deadline by which it must be decided.
Phase 1: stack#
The skill walks through this matrix, one row at a time:
| Decision | Push past vague answers like... |
|---|---|
| Language / runtime | "TypeScript" → which Node version, ESM/CJS |
| Framework | "Next.js" → which router (App / Pages), version, rendering mode default |
| Hosting / deploy target | "Vercel" → fluid compute / edge / sandbox; preview-deploy strategy |
| Database | "Postgres" → host, version, connection pooler, branching strategy |
| ORM / query layer | "Drizzle" → migrations tool, schema location, transaction patterns |
| Auth | "Clerk" → session model, organizations, JWT vs cookie |
| Payments / billing | "Stripe" → integration shape, webhook handler location, customer model |
| Storage / files | "Vercel Blob" → public/private split, signed URL strategy |
| Email / notifications | "Resend" → transactional vs marketing split, templates location |
| Background jobs / queues | "Vercel Queues" → at-least-once acceptance, idempotency keys |
| Search | "Postgres FTS" → if scaled, what's the migration path |
| Analytics / observability | "PostHog + Sentry" → what's tracked vs sampled |
Stack decisions get logged to §6 (Trade-off log) as you make them: name the decision, name the alternative considered, name the reason.
Phase 2: data model#
The most expensive section to get wrong. The skill pushes hard:
- Entities. Every persisted entity, with a one-sentence purpose and 3–7 fields that matter most.
- Relationships. For each FK: cardinality, ON DELETE behavior, soft-delete applicability.
- Identity strategy. UUIDs (v4? v7?), nanoids, sequential ints, or composite keys. Pick once.
- Multi-tenancy. None, row-level (
tenant_idcolumn), schema-per-tenant, or DB-per-tenant. - Soft-delete vs hard-delete. Per-entity if mixed.
- Audit / history. Which entities need change history, where it lives.
- Time fields.
created_at/updated_at/deleted_at? Timezone storage?timestamptzortimestamp?
Phase 2 ends with a Mermaid entity diagram inside ARCHITECTURE.md. Even rough, the visual catches relationship mistakes prose hides.
Phase 3: service shape and boundaries#
For most projects this is a one-page section. Skipping it gives you tangled code in month 3.
- Topology (monolith / modular monolith / services). Default modular monolith.
- Module boundaries — 3–6 top-level modules, what each owns.
- API style (REST / RPC / GraphQL / server actions / mix).
- API conventions — resource naming, error envelope, pagination, idempotency.
- Internal vs external API split. Auth boundary at each.
Phase 4: cross-cutting concerns#
These don't belong to any one feature spec, so without ARCHITECTURE.md they get reinvented per-spec inconsistently.
- Auth & authorization. Role model. Where checks live (middleware / route handler / query layer).
- Error handling. Error class taxonomy. How errors surface to clients.
- Logging. What's logged at each level. PII handling. Where logs go.
- Observability. Tracing, metrics, alerts.
- Caching. What, where, TTL/tag/invalidation.
- Rate limiting. Per-route, per-user, global. Backend.
- Secrets. Where they live. Rotation. Injection.
- Background work. Where it runs. Failure handling. Idempotency.
- Testing strategy. Unit / integration / e2e split. What's mocked vs real (especially the database).
- Migration & deploy strategy. Forward-only / expand-contract / dual-write. Rollback story.
Phase 5: evolution and bets#
A good ARCHITECTURE.md ages well because it admits what's provisional.
- Reversibility map. For each major decision, mark
easy/medium/hardto reverse. The hard ones are where you should have spent the most thought. - Bets. Falsifiable claims, e.g. "users won't exceed 10k rows per tenant." Include the trigger that would force a redesign.
- Deferred decisions. Things you intentionally pushed off, with the deadline by which they must be revisited (usually a phase boundary in
docs/ROADMAP.md). - Known wrong choices shipping anyway. Documenting them prevents the "why did we do this" archaeology in 6 months.
Phase 6: trade-off log#
Append-only section at the bottom. Every entry: date, decision, alternatives considered, reason chosen, links to relevant specs/PRs if available.
Example entry:
### 2025-03-12 — ORM choice
- **Chose:** Drizzle
- **Considered:** Prisma, Kysely, raw SQL with Postgres.js
- **Reason:** Schema-first authoring, low runtime overhead (no codegen process running),
Postgres-typed query builder. Prisma's runtime engine adds ~30ms per query in our
early benchmarks; Drizzle's prepared queries are faster. Kysely is also fast but its
ecosystem is thinner and we want batteries-included migrations.
- **Reversibility:** medium. Migration would require rewriting query layer but schema
is portable.
- **Related:** spec P0-#3 (database setup)Short, specific, structured the same way every time so future-you can scan the log quickly.
Phase 7: write the file#
After all six phases, the skill writes docs/ARCHITECTURE.md from its internal template. Sections you decided are filled in. Sections you deferred read **Deferred** — see Trade-off log entry [date], so triage and review can flag specs that touch unresolved decisions.
After writing, the skill suggests:
- Re-running
/backlog-triageon any pending Inbox items now that ARCHITECTURE.md exists. Triage is sharper with architecture context. - Updating the PRD's Revision log if any ARCHITECTURE decision invalidated a PRD assumption.
Per-decision review with /architecture-review#
architecture-md-builder is the full interrogation, run once at project setup (or after a major pivot).
For individual architectural decisions later — a new service, a schema migration strategy, a caching choice for a specific query — use the architecture-review skill:
architecture-reviewOr invoke as a slash command in some configurations:
/architecture-reviewThis is a targeted skill. You describe one decision; it reads PRD + ARCHITECTURE.md, frames the decision honestly, surfaces trade-offs, and recommends a path. It appends the resolved decision to ARCHITECTURE.md §6 Trade-off log.
When to use /architecture-review#
Run it when the decision touches one of:
- Schema or data model changes (new entity, FK, index, partition strategy)
- Service or module boundaries (extract into its own service, new module)
- Auth, authorization, or session model changes
- Caching strategy (what to cache, where, invalidation model)
- Background work / queueing (new job, retry semantics, idempotency)
- Tech stack picks (new library for a cross-cutting concern)
- Migration / rollout strategy (forward-only vs expand-contract, dual-write, backfill)
- Public API shape (a new external surface, breaking change)
When NOT to use it#
Skip it for:
- Choosing between two ways to write a function — that's the executor's call.
- UI patterns — that's
/craft-uiand DESIGN.md. - Bug fixes that don't change the shape of anything.
- Decisions already resolved in ARCHITECTURE.md — just follow the doc.
- Renames.
The test: if this decision turns out wrong, how expensive is reversing it? If "an afternoon," skip the review. If "weeks of migration work or a breaking change for users," run it.
What /architecture-review produces#
A typical run produces:
- A reframed decision statement (the skill restates the question to confirm understanding — often the user's phrasing hides the real question).
- An anchor check against ARCHITECTURE.md (does the existing doc already commit to one of the options? does it rule out one?).
- Trade-off table for surviving options (concrete pros and cons, reversibility, operational cost, coupling).
- A recommendation with one paragraph of reasoning.
- A Trade-off log entry the user approves before it's written.
If existing commitments resolve the question, the skill says so and stops. The right outcome of a 30-second review is "ARCHITECTURE.md §1 already commits to X — go with X." That's a win, not a punt.
The Trade-off log earns its keep#
The Trade-off log is the highest-leverage piece of ARCHITECTURE.md. Six months in, when someone asks "wait, why did we choose X over Y?", the log has the answer: date, alternatives, reason. Without it, every architectural choice gets re-litigated, often by an agent with no memory of the original constraints.
If you do nothing else from this chapter, commit to the Trade-off log. A thin ARCHITECTURE.md with a thick Trade-off log beats a thick ARCHITECTURE.md with no log.
Common stumbles#
| Symptom | Fix |
|---|---|
| Specs keep needing rework because schema/auth/etc. choices were wrong | Run architecture-md-builder once before backlog-triage; run /architecture-review per architecturally loaded decision |
| Triage refuses to write a spec citing "architectural load" | Take the hint — run /architecture-review and append to §6, then resume triage |
| ARCHITECTURE.md says "Postgres" and not much else | Run architecture-md-builder again; the skill will only re-interrogate vague sections |
| The Trade-off log is empty | Every architectural decision should land there. If the log is empty, the architecture isn't documented. It's implied. |
| You're tempted to make a big architectural decision in chat without logging it | Stop. Run /architecture-review. Five minutes of friction prevents a year of "why" archaeology. |
Continue#
If your project has UI: Chapter 07: The design doc. Otherwise: Chapter 08: Roadmap and backlog.