Chapter 02

Technical engraving of nested document frames and a calibration rail — Fig. 02Context as an adjustable apparatus.

02 — The mental model

When to read this: Before you install the kit. The decisions in later chapters make more sense once you have this scaffolding in place. About a 10-minute read.

The kit looks like a pile of templates and shell scripts. The model behind it is simpler than the file count suggests. Five ideas, in this order:

Three living docs, each owning a different question.
Living, not write-once. The docs get updated at phase boundaries.
Verification gates. Different work needs different review. Gates aren't interchangeable.
Taste-first. Aesthetic and architectural commitments come before code.
Continual memory. The project gets smarter over time without retraining.

Hold these five in your head and the rest of the kit reads as obvious.

1. Three living docs#

Three documents, three jobs. Don't conflate them:

Doc	Lives at	Job
`docs/PRD.md`	`docs/`	What you're building and why. Users, problem, V1 capabilities, non-goals, success metrics, constraints.
`docs/ARCHITECTURE.md`	`docs/`	How it's built. Stack, data model, service shape, cross-cutting concerns, trade-off log.
`DESIGN.md`	repo root	How it looks and feels. Brand, type, color, motion, forbidden defaults.

A fourth file — CLAUDE.md at the repo root — is session memory. Not a doc. A one-paragraph distillation of the above plus a list of rules and file locations. Agents read it every turn. Think of it as the boot sector for any session.

A fifth file — docs/ROADMAP.md — is short. It's the order things ship in. 2–4 phases, each shippable on its own.

The hierarchy:

        ┌────────────────────────┐
        │    docs/PRD.md         │   what & why
        └───────────┬────────────┘
                    │
          ┌─────────┴─────────┐
          ▼                   ▼
  docs/ARCHITECTURE.md     DESIGN.md       how it's built / how it looks
          │                   │
          └─────────┬─────────┘
                    │
                    ▼
            docs/ROADMAP.md                what ships when
                    │
                    ▼
            docs/backlog.md                actual work items
                    │
                    ▼
        docs/backlog/phase-*.md            triaged specs

Read top to bottom. PRD constrains everything below it. ARCHITECTURE and DESIGN constrain ROADMAP and the backlog. Specs land at the bottom.

When you start a new project, fill these in roughly top to bottom: PRD first, then ARCHITECTURE (if needed) and DESIGN (if there's UI), then ROADMAP, then seed the backlog. Chapter 04 walks through this order.

2. Living, not write-once#

The most common failure with project documentation: it gets written in week 1 and abandoned in week 4. The kit pushes back with two patterns.

Append-only logs. Every doc that captures decisions has a log section that grows over time:

PRD.md has a Revision log at the bottom. Every PRD change gets an entry: date, what shifted, why, what got carried forward.
ARCHITECTURE.md has a Trade-off log. Every architectural decision (Drizzle vs Prisma, caching strategy, multi-tenancy model) gets logged with alternatives considered and reason chosen.
DESIGN.md has a Forbidden defaults section that grows as you spot patterns the agent reaches for that don't fit the brand.

Refinement skills. Three skills exist to refresh the docs:

prd-revise runs at phase boundaries to detect drift between the PRD and what shipped.
architecture-review runs per-decision when you're about to make a hard-to-reverse architectural choice. It anchors against the existing ARCHITECTURE.md and appends to the Trade-off log.
/forbid is a slash command that appends one project-specific forbidden default to DESIGN.md. Use it whenever you spot a pattern the agent keeps reaching for that doesn't fit.

The pattern: decisions are written down once, then either confirmed or revised deliberately. They never silently rot.

If you skip the refinement skills, the docs decay. The kit can't force you to run them. But if your PRD says "V1 doesn't have public sharing" and three months later you've shipped public sharing without updating the PRD, the agent's behavior gets incoherent fast.

3. Verification gates#

Tests pass ≠ work is good. Specifically:

A passing test suite verifies that backend logic does what its tests say. It doesn't verify the UI looks right, that a migration is safe to run on production data, or that an API endpoint correctly checks tenant ownership.
A clean visual rubric verifies that UI matches DESIGN.md. It doesn't verify the API behind the UI is sound.
An architecture review verifies the system shape stays coherent. It doesn't verify the per-line SQL in a migration won't lock a hot table.

Different work needs different gates. The kit ships four read-only review subagents, each owning one:

Subagent	What it checks
`design-reviewer`	UI against DESIGN.md, screenshots at 4 viewports, forbidden defaults, token discipline, state coverage, motion sanity
`architecture-reviewer`	Code changes against ARCHITECTURE.md — stack alignment, data model, service boundaries, cross-cutting concerns, trade-off log freshness
`migration-reviewer`	Database migration safety — locks, backfill cost, NOT NULL adds, FK validation, rollback story, multi-tenant exposure
`api-reviewer`	API endpoint completeness — authorization granularity, input validation, idempotency, rate limiting, webhook signature verification

Plus a fifth subagent, agents-memory-updater, that handles the continual-memory loop (more on that below).

The gates are non-substitutable. You can't skip the design review because the tests pass. You can't approve a migration because the architecture review was clean. Each gate covers a layer the others miss.

Severity codes are consistent across the reviewers:

🔴 BLOCKING — direct conflict with a written-down rule. Cannot ship.
🟡 NEEDS DECISION — the change deviates from the rule, but the deviation might be intentional. You decide: revert, or update the rule + log the trade-off.
🟢 ADVISORY — the change is fine, but a related improvement is recommended.

Chapter 10 covers the gates in depth.

4. Taste-first#

The kit's strongest opinion: commit to specifics before you write code, not after.

For UI, this means writing DESIGN.md first. The doc isn't a style guide. It's a constraint system. It names a specific aesthetic direction ("editorial brutalism," "warm terminal/IDE," "Japanese ma minimalism"), names three adjectives the brand IS and three it IS NOT, lists forbidden defaults, and commits to specific fonts and OKLCH colors. Then /craft-ui reads DESIGN.md every time it builds something, and the design-reviewer reads DESIGN.md every time it reviews.

Why it matters: AI agents converge toward generic outputs by default. Without specific constraints written down, the agent fills the silence with the average of its training data — Inter, purple gradients, three-column feature grids, Material default easing. The fight against generic isn't fought during code generation. It's fought during DESIGN.md authoring.

The same principle applies to architecture (commit before coding via architecture-md-builder) and to product (commit before coding via prd-grill). All three skills follow the same shape: one question at a time, recommended answer with each question, push past the first vague answer, write the file when you've actually decided.

Chapter 07 covers DESIGN.md in depth. Chapter 06 covers ARCHITECTURE.md. Chapter 05 covers PRD.md.

5. Continual memory#

Two memory files, two different kinds of memory:

CLAUDE.md is authored memory. You wrote it. The agent reads it every turn. It changes when you change it.
AGENTS.md is learned memory. It captures recurring user preferences and durable workspace facts that emerge over time — "the user always wants commit messages in imperative mood," "this project's database is Postgres on Neon, accessed via Drizzle."

The kit ships a continual-learning system that updates AGENTS.md automatically:

A Stop hook runs after every Claude Code turn, tracking how many turns have happened and how much time has passed.
When the cadence threshold is hit (default: ≥10 turns + ≥120 minutes since last run + transcript actually advanced), the hook emits a block decision that injects a follow-up: "run the continual-learning skill."
The continual-learning skill delegates to the agents-memory-updater subagent.
The agents-memory-updater subagent mines new transcripts incrementally (only files added or modified since last run, tracked via an index file), pulls out durable patterns, deduplicates, caps each learned section at 12 bullets, and writes the result back to AGENTS.md.

The point: the project gets smarter over time without retraining. When you correct the agent for the seventh time about your preferred PR description format, that correction surfaces as an AGENTS.md bullet a few turns later, and the next session reads it.

A trial mode exists (CONTINUAL_LEARNING_TRIAL_MODE=1) with a looser cadence so you can verify the loop works without waiting two hours.

Chapter 12 covers the memory system in depth.

How the pieces fit together#

All five ideas in one diagram:

              ┌──────── one-time setup ────────┐
              │                                │
              │  prd-grill        → PRD.md     │
              │  architecture-md-builder       │
              │                   → ARCH.md    │
              │  design-md-builder             │
              │                   → DESIGN.md  │
              │                                │
              └────────────────┬───────────────┘
                               │
                               ▼
              ┌─────────── per-spec loop ──────────┐
              │                                    │
              │  /pick-next-task                   │
              │       │                            │
              │       ▼                            │
              │  /kickoff-spec  or  /craft-ui      │
              │       │                            │
              │       ▼                            │
              │  verification gate                 │
              │  (tests / design-reviewer /        │
              │   architecture-reviewer / etc.)    │
              │       │                            │
              │       ▼                            │
              │  /ship-spec                        │
              │                                    │
              └────────────────┬───────────────────┘
                               │
                               ▼
              ┌────────── refinement (cadence) ────┐
              │                                    │
              │  /prd-revise  (every phase)        │
              │  /architecture-review  (per dec)   │
              │  /forbid  (when noticed)           │
              │  agents-memory-updater  (auto)     │
              │                                    │
              └────────────────────────────────────┘

Setup happens once. The per-spec loop runs as many times as you ship features. Refinement runs at boundaries: phases, architectural decisions, design observations, memory cadence triggers.

If you only remember three things from this chapter:

Three living docs. PRD is what + why. ARCHITECTURE is how it's built. DESIGN is how it looks. Not the same. Don't combine them.
Verification gates aren't substitutable. Tests don't catch generic UI. Visual review doesn't catch broken access control. Each layer needs its own check.
Commit before coding. PRD before architecture. Architecture before design. Design before UI. The fight against generic and the fight against drift are both won upstream.

Continue#

Next: Chapter 03: Installation to set up the kit on your machine.