Chapter 01

Technical engraving of a folder tray, documents, guide rails, and connector pins arranged as a kit map — Fig. 01The kit as a working set.

01 — What is this kit

When to read this: First chapter for new readers. If you already know what the kit does and you want to install it, skip to Chapter 03.

The problem#

You've shipped something with an AI agent. Maybe a side project, maybe a feature at work. You've also seen at least one of these failure modes:

The output looks generic. Inter font. Three-column feature grid. Purple-on-white gradient. Everything the agent built could be the home page of any other SaaS.
The agent invented a feature that wasn't in the plan and skipped one that was.
A month later, a teammate or future-you asks the agent a question. The agent has no memory of the original constraints, and proposes the exact thing you decided last month not to do.
A schema change shipped that locked the production database for four minutes during peak traffic. The migration "worked on my machine."
An API endpoint shipped that authenticated the user but didn't check whether the user belonged to the resource being queried.

These aren't bugs in the model. They're consequences of how most people use AI agents:

No persistent context. The agent reads the code and a vague README, then guesses everything else from training-data defaults.
No verification gates that match the work. Tests pass, the agent ships. But tests don't catch generic UI or broken access control.
No mechanism for taste. Design opinions live in your head. The agent can't read your head, so it ships the average of its training data.

The result is what people call "AI slop": output that's technically functional, generic, and sometimes broken in ways tests don't catch.

The solution#

The Agent Workflow Kit is a small set of files you drop into a fresh project to fix the failure modes above. It does five things:

Scaffolds session memory. Two files the agent reads every turn. CLAUDE.md for product intent, stack, and non-negotiables. AGENTS.md for learned preferences and durable workspace facts.
Scaffolds three living docs. PRD (what you're building and why), ARCHITECTURE (how it's built), DESIGN (how it looks and feels). Written deliberately at project start, refreshed at phase boundaries. Not write-once. Not ignored.
Installs verification gates. Specialized read-only review subagents: design-reviewer, architecture-reviewer, migration-reviewer, api-reviewer. They catch the failure modes your tests don't.
Installs a build loop. A backlog system (docs/backlog.md), spec kinds (ui / backend / infra), and slash commands that take a triaged spec, dispatch it to the right executor, and gate on the right verification.
Installs continual memory. A Stop hook periodically mines your transcripts and updates AGENTS.md with high-signal preferences. The project gets smarter over time without retraining.

Run one command to install it globally. Run one more command in any new project to drop everything in.

What you get after running `agent-workflow`#

The bootstrap step writes these files into your project:

CLAUDE.md                                   ← session memory: product, stack, rules
AGENTS.md                                   ← learned prefs and workspace facts
GETTING-STARTED.md                          ← read-once onboarding
prompts/
  ├── prompt-roadmap.md                     ← prompt to draft ROADMAP.md from PRD
  └── prompt-seed-backlog.md                ← prompt to seed Inbox from PRD + ROADMAP
docs/
  ├── backlog.md                            ← Inbox + phase index
  └── agent-workflow-skills.md              ← in-project skill reference
.claude/
  ├── commands/                             ← /craft-ui, /scaffold-component, /forbid
  ├── agents/                               ← five review/memory subagents
  ├── skills/                               ← six builder/refiner skills
  ├── hooks/                                ← Stop hook for continual learning
  └── settings.json                         ← project Claude Code settings
.cursor/
  ├── rules/project-memory.mdc              ← always-read rule for Cursor
  └── hooks/state/                          ← transcript mining starter index
scripts/check-tokens.sh                     ← bash linter for design tokens
tests/a11y.spec.ts                          ← Playwright + axe-core a11y test

Everything is text. Everything is in your repo. Everything is version-controlled. Nothing depends on a SaaS.

What you'll do after that#

The kit doesn't write your product for you. It gives you a structured way to figure out what to build:

Run prd-grill. It asks one question at a time: who V1 is for, what problem it solves, what V1 does, what V1 explicitly doesn't do (the longest section), what success looks like, what constraints apply. Output: docs/PRD.md.
Run architecture-md-builder if your project has state, integrations, or cross-cutting concerns. Same pattern. Output: docs/ARCHITECTURE.md.
Run design-md-builder if your project has UI. Same pattern. Output: DESIGN.md at the repo root.
Draft a ROADMAP using prompts/prompt-roadmap.md.
Seed the backlog using prompts/prompt-seed-backlog.md.
Triage the backlog with /backlog-triage. Each spec gets tagged kind: ui | backend | infra.
Pick → kickoff → ship. Forever.

That's the whole loop. Chapters 04 through 09 walk through it.

What this kit is not#

A few non-goals to help you decide if it's for you.

Not a Next.js starter. Works with any framework: Next.js, Vite + React, Astro, Remix, raw Node, Python, Go. Some templates (the token check script, the a11y test) assume a JavaScript-flavored ecosystem. The workflow is framework-agnostic.
Not a design system. It enforces that you use semantic tokens. It doesn't ship a component library. You write your own components. The kit gives you a scaffolding command (/scaffold-component) and a token linter.
Not a CI/CD tool. The token check and a11y test can be wired into CI. The kit itself runs locally inside your editor.
Not a SaaS. Nothing phones home. Nothing requires login. Everything is local files.
Not magic. It can't make a vague PRD specific. It can ask you the questions until you make it specific. The agent following the kit's workflow is still your agent. It's only as good as the constraints you give it.

Outcomes to expect#

Real outcomes from teams using this kit:

Faster onboarding for collaborators (human or agent). A new contributor reads CLAUDE.md, DESIGN.md, docs/PRD.md, and docs/ARCHITECTURE.md, then ships within an hour.
Less drift between sessions. The agent picks up where the last one left off. The constraints are written down, not in your head.
Fewer generic UI moments. DESIGN.md plus design-reviewer catches regression early.
Fewer architectural surprises. architecture-reviewer and migration-reviewer block schema and migration mistakes that would otherwise ship.
A workflow you can hand off. Take a vacation. The kit's docs are enough for someone else (or another agent) to pick up the build.

The kit won't turn a chaotic build into a calm one in one afternoon. Setting up the docs takes a few hours up front. The payoff comes over weeks, when you stop re-explaining the project every Monday morning.

Continue#

Next: Chapter 02: The mental model for the conceptual scaffolding, or Chapter 03: Installation to start setting up.