Chapter 10

Technical engraving of a mechanical verification gate moving paper through inspection stations — Fig. 10The verification gate.

10 — Verification gates

When to read this: When you want to understand why the kit ships four read-only review subagents instead of one big "review my code" command. Conceptual + reference. Read it once. Return to it when you're configuring or extending the gates.

The core idea: gates aren't substitutable#

A verification gate is a check that runs before work is considered done. The kit's central claim is that different kinds of work need different kinds of gate, and they aren't interchangeable.

Specifically:

Tests pass verifies that the code does what its tests say. It does not verify that the UI looks right, that a migration is safe at production scale, or that an API endpoint correctly checks tenant ownership.
A clean visual rubric verifies that UI matches DESIGN.md. It does not verify that the API behind the UI is sound or that a schema change is safe.
An architecture review verifies that the system shape stays coherent. It does not verify that the per-line SQL in a migration won't lock a hot table.
A migration review verifies the per-line DDL safety. It does not verify that the architectural strategy was right at all.

You can't ship a UI feature on the strength of a passing test suite. You can't ship a schema migration on the strength of a clean architectural review. Each layer needs its own check.

The kit ships four read-only review subagents, each owning a different gate:

Subagent	Layer	Anchored against
`design-reviewer`	UI vs DESIGN.md, multi-viewport visual	DESIGN.md
`architecture-reviewer`	System shape, cross-cutting concerns	docs/ARCHITECTURE.md
`migration-reviewer`	Per-line DDL safety	docs/ARCHITECTURE.md §1 (stack), §4 (migration strategy), §5 (bets)
`api-reviewer`	Per-endpoint completeness and safety	docs/ARCHITECTURE.md §3 (API conventions), §4 (auth, errors, rate limiting)

Plus a fifth subagent — agents-memory-updater — that handles continual memory rather than verification (covered in Chapter 12).

A few rules apply to all four reviewers:

They are read-only. They never edit code. They produce reports with specific diff suggestions; you (or the parent agent) apply the diffs.
They anchor against written-down rules. Findings cite specific DESIGN.md or ARCHITECTURE.md sections. Without those docs, the reviewers fall back to a code-only sanity pass and announce that fact at the top of the report.
They never approve work that violates a 🔴 BLOCKING finding, even if you push back. Either the code is wrong or the rule is wrong. Both require a deliberate update. Not silent acceptance.
They use a consistent severity taxonomy:

Severity	Meaning	Action
🔴 BLOCKING	Direct conflict with a written rule	Cannot ship. Fix the code or fix the rule.
🟡 NEEDS DECISION	Deviates from the rule but the deviation might be intentional	You decide: revert, or update the rule + log the trade-off
🟢 ADVISORY	Fine, but a related improvement is recommended	Optional

How to invoke the reviewers#

All four reviewers are invoked the same way: via the Agent tool, either by you or by the parent agent.

By you, in chat:

have the design-reviewer check this

review the migration with migration-reviewer

ask the api-reviewer to check the new endpoints

By the parent agent during a workflow: /craft-ui Phase 8 invokes design-reviewer automatically. The kit's CLAUDE.md template instructs the parent agent to invoke architecture-reviewer / migration-reviewer / api-reviewer after relevant changes without asking. No behavioral difference between user-triggered and agent-triggered invocations.

Run reviewers in parallel when their concerns don't overlap. A feature that adds both a migration and a new endpoint can run migration-reviewer and api-reviewer at the same time.

design-reviewer#

File: .claude/agents/design-reviewer.md after bootstrap.

What it checks (the rubric):

Aesthetic match. Does the work reflect the named direction in DESIGN.md? Specific lines from DESIGN.md are cited when calling regression toward generic.
Forbidden defaults. Inter / Roboto / Open Sans / Lato / Arial / system stacks; purple-on-white gradients; generic SaaS blue/teal; three-column feature grids with identical card heights and centered icons; Material default easing cubic-bezier(0.4, 0, 0.2, 1); hover scales for decoration; animations over 400ms for state change. Plus every entry in DESIGN.md's project-specific forbidden section.
Token discipline. Greps the changed files for hex literals in className strings, bg-{color}-{n} Tailwind utilities (red, blue, zinc, etc.), text-{color}-{n}, border-{color}-{n}, hardcoded font stacks. Each is a violation.
Type contrast. Display vs body distinguishable at a glance. Weight extremes used per DESIGN.md (typically 200/800, not 400/600).
Spatial rhythm. Spacing values come from the scale. Vertical rhythm clean.
State coverage. Every interactive element has visible hover, focus-visible at 3:1 contrast minimum, active, disabled. Loading, error, and empty states present where applicable.
Motion sanity. Durations under 400ms for state. Only transform and opacity animated. prefers-reduced-motion respected.

Stress tests for interactive components:

Click rapidly 10x on the primary action. Anything shift unexpectedly?
Tab through with keyboard. Focus order logical? Indicators visible at every stop?
Trigger an error state. Does the UI degrade locally or block everything?
Resize to 320px width. Layout survives?
Throttle to slow 3G in browser devtools. Is the loading state better than no state?

Visual capture: if Playwright MCP is available and a dev server is running, the reviewer captures screenshots at 375px, 768px, 1280px, and 1920px. If not, it produces a code-only review based on the diff and announces that at the top of the report.

To enable Playwright MCP:

claude mcp add playwright -- npx @playwright/mcp@latest

Output format:

## Design Review: [feature or route name]
 
### Verdict
[PASS | NEEDS CHANGES | FAIL]
 
### Aesthetic match
[1–2 sentences. Quote specific DESIGN.md lines.]
 
### Violations
- [file:line — specific violation, with the rule it breaks]
 
### Diff suggestions
- `path/to/file.tsx:23` — replace `bg-zinc-900` with `bg-[--color-bg-elevated]`
- `app/page.tsx:88` — body leading is 1.4; DESIGN.md specifies 1.6 for body
- Hero animation duration is 600ms; DESIGN.md ceiling is 400ms
 
### State coverage
[List of states verified or missing per component touched.]
 
### Stress test results
[If interactive. Otherwise omit.]
 
### Screenshots captured
[List of viewports successfully captured, or note if Playwright was unavailable.]

architecture-reviewer#

File: .claude/agents/architecture-reviewer.md after bootstrap.

Anchor: docs/ARCHITECTURE.md. If the file doesn't exist, the reviewer:

Says so at the top of the report.
Asks whether the project is genuinely architecture-light (static site, simple CLI, library) or whether ARCHITECTURE.md should exist.
For architecture-light projects, produces a code-only sanity pass. Flags any obvious cross-cutting concerns introduced (auth, persistent storage, payments, queues) that should trigger a real ARCHITECTURE.md.

What it checks (the rubric):

Stack alignment. Do imports / config / dependencies match §1? Flag any new library that wasn't authorized. Verify against the spec's Allowed packages line.
Data model. Do new entities / FKs / columns / indexes / migrations match §2? Identity strategy consistent, multi-tenancy column present, soft-delete handling correct, time fields correct, ON DELETE behavior matches.
Service shape. Module boundaries (§3) respected? No reaching across boundaries. New API surface introduced without an ARCHITECTURE.md update is a violation.
Cross-cutting concerns. Auth at the documented enforcement layer. Errors thrown from defined classes. Logging matches the format. Caching obeys §4. Background work registered with the documented queue + idempotency strategy. Secrets fetched via the documented mechanism.
Migration & rollout. Forward-only vs expand-contract pattern matches §4 strategy. Backfill stated and reversible. Rollback story described. Migration is idempotent.
Evolution / bets. Does the change conflict with a stated bet in §5? E.g. "users won't exceed 10k rows per tenant" — does the new query break if they do?
Trade-off log freshness. Does an architecturally material change have a §6 entry? If not, the deviation is undocumented even if it's right.

Output format: standard verdict + sections + 🔴 / 🟡 / 🟢 findings + suggested diffs or trade-off log entries.

A representative finding:

🔴 BLOCKING
- src/server/queries/getUserDashboard.ts:14 — query lacks `tenant_id` filter.
  ARCHITECTURE.md §2 commits to row-level multi-tenancy on every query path.
  Fix: add `where(eq(users.tenantId, ctx.tenantId))` or use the project's
  scoped-query helper.

migration-reviewer#

File: .claude/agents/migration-reviewer.md after bootstrap.

Lane: This reviewer's sole concern is production safety of this migration. Strategic questions ("should we even use Postgres?") belong to architecture-reviewer. Code quality belongs to pre-commit-review. Visual review belongs to design-reviewer. The migration-reviewer stays in lane.

Anchor: docs/ARCHITECTURE.md §1 (stack — what database, what version, what host), §4 (Migration & deploy strategy), §5 (Bets — especially scale assumptions). If absent, the reviewer asks for the database engine + version explicitly. Lock-impact rules vary across Postgres, MySQL, SQLite, Cloud Spanner.

Per-statement DDL review:

Lock acquisition. For each DDL, name the lock acquired (ACCESS EXCLUSIVE, SHARE, ROW EXCLUSIVE, none) and whether it blocks reads, writes, or both.
Backfill cost. Row count estimate, batch strategy, idempotency, default-value cost on column add (volatile vs constant defaults have different lock profiles in modern Postgres).
NOT NULL adds. Direct SET NOT NULL on a populated column = 🔴. Safe pattern: nullable → backfill → CHECK NOT VALID → VALIDATE CONSTRAINT → optionally SET NOT NULL.
FK adds. NOT VALID then separate VALIDATE CONSTRAINT. Index on referencing column required. ON DELETE behavior matches §2.
Index hygiene. Every FK has an index. CONCURRENTLY on hot Postgres tables. No redundant indexes. Composite index column order matches actual query patterns.
Rename safety. Single-step renames are deploy-time race conditions unless the application is offline. Safe pattern: expand-contract.
Transaction wrapping. CREATE INDEX CONCURRENTLY cannot be in a transaction. Long backfills shouldn't hold a transaction open.
Rollback story. Reversible / forward-only / irreversible — must match what the spec declares.
Data integrity at the boundary. CHECK constraints with NOT VALID then VALIDATE. Triggers idempotent. Enum changes safe.
Cross-tenant exposure. Multi-tenant projects: every new table needs tenant_id + index, every new query path needs tenant_id in the WHERE clause, RLS policies (if used) added in the same migration.

The output is shaped to be actionable:

🔴 BLOCKING
- db/migrations/20250320_add_org_id.sql:7 — `ALTER TABLE users ADD COLUMN org_id UUID NOT NULL DEFAULT gen_random_uuid()` will rewrite the table under ACCESS EXCLUSIVE for the duration. ARCHITECTURE.md §5 bet: "we expect 100k+ rows in users by P3" — this will lock for several minutes.
 
Suggested rewrite (split into 4 migrations):
  20250320_a_add_org_id_nullable.sql
    ALTER TABLE users ADD COLUMN org_id UUID;
  20250320_b_backfill_org_id.sql
    -- batch in 5k chunks via app code or pg_cron
  20250320_c_add_check_not_valid.sql
    ALTER TABLE users ADD CONSTRAINT users_org_id_not_null CHECK (org_id IS NOT NULL) NOT VALID;
  20250320_d_validate_check.sql
    ALTER TABLE users VALIDATE CONSTRAINT users_org_id_not_null;

The reviewer always provides copy-paste-ready safer DDL for each 🔴 finding, including the multi-step pattern split into separate migrations if needed.

api-reviewer#

File: .claude/agents/api-reviewer.md after bootstrap.

Lane: Per-endpoint completeness and safety. Strategic API design ("REST or RPC?") belongs to architecture-reviewer. Schema changes to migration-reviewer. Code quality to pre-commit-review. Visual review to design-reviewer. The api-reviewer stays in lane.

Anchor: docs/ARCHITECTURE.md §3 (API style, error envelope, pagination, idempotency, versioning) and §4 (auth & authorization, error handling, rate limiting, logging). If absent, the reviewer falls back to universal rules (authorization granularity, input validation, mass-assignment safety, webhook signature verification, open-redirect/SSRF safety, status-code correctness) and recommends running architecture-md-builder.

The most common 🔴 finding in AI-generated APIs isn't "missing auth." It's broken access control with granularity. The endpoint authenticates the user, but it doesn't check whether the user belongs to the resource being queried. Example:

// 🔴 BLOCKING — missing granularity check
export async function GET(req: Request, { params }: { params: { orgId: string } }) {
  const user = await getCurrentUser();
  if (!user) return new Response("Unauthorized", { status: 401 });
  const org = await db.org.findUnique({ where: { id: params.orgId } });
  return Response.json(org);
}

This authenticates the user, then returns any org the user requests by ID. The fix is verifying membership:

// ✅ Safe
const membership = await db.orgMember.findFirst({
  where: { orgId: params.orgId, userId: user.id },
});
if (!membership) return new Response("Forbidden", { status: 403 });

The full rubric:

Authorization granularity. Auth check verifies the user belongs to the specific resource. Not just that the user exists.
Multi-tenant filtering. Every query in the handler filters by tenant_id (or equivalent) if the project commits to row-level multi-tenancy.
Role checks. Admin-only endpoints verify role, not just authentication.
Internal-only endpoints rejected at the platform layer, not by URL obscurity.
Input validation. Schema-validated (Zod / Yup / Valibot). Mass-assignment safe — no db.update({ ...req.body }). String length limits. Array bounds.
Idempotency. POST endpoints with external side effects (charging, email, third-party API) need idempotency keys.
Rate limiting. Per-route, per-user, scoped where ARCHITECTURE.md §4 says.
Error envelope. Matches the shape documented in §3 / §4.
Status codes. 201 on create, 422 (or 400) on validation, 401 vs 403, 429 with Retry-After, 409 on conflict.
Response shape consistency. Pagination format consistent. No leaked internal fields (password_hash, internal_notes).
URL safety. Open redirect — any user-supplied URL in Location or redirect() validated against an allowlist. SSRF — any user-supplied URL fetched server-side validated; private IP ranges disallowed.
Webhook handlers — signature verification before parsing, replay protection, event-ID idempotency, async-safe (queue work >5s).
Audit / logging. Mutation handlers log actor + action + resource ID per §4.

🔴 findings block the kickoff verification gate. The reviewer produces copy-paste-ready safer code in the project's existing style.

When to skip a reviewer#

The reviewers exist to catch specific failure modes. Some changes don't trigger any of them:

Pure refactor with no behavior change. No design, architecture, migration, or API surface to review. Tests pass + pre-commit-review covers it.
Doc edits. None of the four reviewers apply.
Config tweaks that don't touch the runtime. Same.

Don't run a reviewer just because the kit ships one. Run the reviewer whose lane the change is in.

When in doubt, the kit's CLAUDE.md template gives default rules:

After any UI change, invoke the design-reviewer subagent before considering the work done.

After any architecturally-loaded change (schema, service boundaries, auth, caching, queues, migrations, public APIs, new cross-cutting libraries), invoke the architecture-reviewer subagent before considering the work done.

After any database schema migration is added or modified, invoke the migration-reviewer subagent before considering the work done.

After any HTTP endpoint, server action, RPC handler, or webhook receiver is added or modified, invoke the api-reviewer subagent before considering the work done.

These rules ship in the bootstrapped CLAUDE.md, so the parent agent reads them every session.

Why a reviewer might say PASS when the work is actually bad#

A reviewer can only catch what's anchored against a written rule. If your DESIGN.md is sparse, design-reviewer has nothing to anchor regression against. If your ARCHITECTURE.md doesn't commit to multi-tenancy, architecture-reviewer can't flag missing tenant filters as 🔴.

When a reviewer says PASS but the work feels off:

The doc is too vague. Sharpen via design-md-builder / architecture-md-builder / /forbid / architecture-review.
The reviewer's rubric is missing a check that matters for your project. Edit the subagent's markdown file directly. They're just files. Chapter 13: Extending the kit covers extending review rubrics.

Building your own reviewer#

The four shipped reviewers are templates. If your project has a domain-specific concern that's not covered — a permissions model, a billing pipeline, a particular library's pitfalls — write your own.

A reviewer subagent is a markdown file in .claude/agents/<name>.md with:

A frontmatter block — name, description (when to invoke it), tools (typically read-only: Read, Grep, Glob, Bash for git diff), model.
A role statement — "You are a senior X doing read-only review."
A lane statement — what's in scope and what isn't.
A workflow — read these files, identify what changed, apply this rubric.
A rubric — specific checks with severity rules.
An output format — typically verdict + section breakdowns + 🔴 / 🟡 / 🟢 findings + suggested diffs.
A what-you-never-do list — guardrails: never edit, never approve 🔴 findings, never use vague feedback, stay in lane.

Copy one of the shipped reviewers (api-reviewer.md is the most thorough template) and adapt the rubric. Chapter 13 walks through this.

Continue#

Next: Chapter 11: Token discipline goes deep on one specific check that runs across all UI work: semantic-token enforcement.