
10 — Verification gates
When to read this: When you want to understand why the kit ships four read-only review subagents instead of one big "review my code" command. Conceptual + reference. Read it once. Return to it when you're configuring or extending the gates.
The core idea: gates aren't substitutable#
A verification gate is a check that runs before work is considered done. The kit's central claim is that different kinds of work need different kinds of gate, and they aren't interchangeable.
Specifically:
- Tests pass verifies that the code does what its tests say. It does not verify that the UI looks right, that a migration is safe at production scale, or that an API endpoint correctly checks tenant ownership.
- A clean visual rubric verifies that UI matches DESIGN.md. It does not verify that the API behind the UI is sound or that a schema change is safe.
- An architecture review verifies that the system shape stays coherent. It does not verify that the per-line SQL in a migration won't lock a hot table.
- A migration review verifies the per-line DDL safety. It does not verify that the architectural strategy was right at all.
You can't ship a UI feature on the strength of a passing test suite. You can't ship a schema migration on the strength of a clean architectural review. Each layer needs its own check.
The kit ships four read-only review subagents, each owning a different gate:
| Subagent | Layer | Anchored against |
|---|---|---|
design-reviewer | UI vs DESIGN.md, multi-viewport visual | DESIGN.md |
architecture-reviewer | System shape, cross-cutting concerns | docs/ARCHITECTURE.md |
migration-reviewer | Per-line DDL safety | docs/ARCHITECTURE.md §1 (stack), §4 (migration strategy), §5 (bets) |
api-reviewer | Per-endpoint completeness and safety | docs/ARCHITECTURE.md §3 (API conventions), §4 (auth, errors, rate limiting) |
Plus a fifth subagent — agents-memory-updater — that handles continual memory rather than verification (covered in Chapter 12).
A few rules apply to all four reviewers:
- They are read-only. They never edit code. They produce reports with specific diff suggestions; you (or the parent agent) apply the diffs.
- They anchor against written-down rules. Findings cite specific DESIGN.md or ARCHITECTURE.md sections. Without those docs, the reviewers fall back to a code-only sanity pass and announce that fact at the top of the report.
- They never approve work that violates a 🔴 BLOCKING finding, even if you push back. Either the code is wrong or the rule is wrong. Both require a deliberate update. Not silent acceptance.
- They use a consistent severity taxonomy:
| Severity | Meaning | Action |
|---|---|---|
| 🔴 BLOCKING | Direct conflict with a written rule | Cannot ship. Fix the code or fix the rule. |
| 🟡 NEEDS DECISION | Deviates from the rule but the deviation might be intentional | You decide: revert, or update the rule + log the trade-off |
| 🟢 ADVISORY | Fine, but a related improvement is recommended | Optional |
How to invoke the reviewers#
All four reviewers are invoked the same way: via the Agent tool, either by you or by the parent agent.
By you, in chat:
have the design-reviewer check this
review the migration with migration-reviewer
ask the api-reviewer to check the new endpoints
By the parent agent during a workflow: /craft-ui Phase 8 invokes design-reviewer automatically. The kit's CLAUDE.md template instructs the parent agent to invoke architecture-reviewer / migration-reviewer / api-reviewer after relevant changes without asking. No behavioral difference between user-triggered and agent-triggered invocations.
Run reviewers in parallel when their concerns don't overlap. A feature that adds both a migration and a new endpoint can run migration-reviewer and api-reviewer at the same time.
design-reviewer#
File: .claude/agents/design-reviewer.md after bootstrap.
What it checks (the rubric):
- Aesthetic match. Does the work reflect the named direction in DESIGN.md? Specific lines from DESIGN.md are cited when calling regression toward generic.
- Forbidden defaults. Inter / Roboto / Open Sans / Lato / Arial / system stacks; purple-on-white gradients; generic SaaS blue/teal; three-column feature grids with identical card heights and centered icons; Material default easing
cubic-bezier(0.4, 0, 0.2, 1); hover scales for decoration; animations over 400ms for state change. Plus every entry in DESIGN.md's project-specific forbidden section. - Token discipline. Greps the changed files for hex literals in className strings,
bg-{color}-{n}Tailwind utilities (red, blue, zinc, etc.),text-{color}-{n},border-{color}-{n}, hardcoded font stacks. Each is a violation. - Type contrast. Display vs body distinguishable at a glance. Weight extremes used per DESIGN.md (typically 200/800, not 400/600).
- Spatial rhythm. Spacing values come from the scale. Vertical rhythm clean.
- State coverage. Every interactive element has visible hover, focus-visible at 3:1 contrast minimum, active, disabled. Loading, error, and empty states present where applicable.
- Motion sanity. Durations under 400ms for state. Only
transformandopacityanimated.prefers-reduced-motionrespected.
Stress tests for interactive components:
- Click rapidly 10x on the primary action. Anything shift unexpectedly?
- Tab through with keyboard. Focus order logical? Indicators visible at every stop?
- Trigger an error state. Does the UI degrade locally or block everything?
- Resize to 320px width. Layout survives?
- Throttle to slow 3G in browser devtools. Is the loading state better than no state?
Visual capture: if Playwright MCP is available and a dev server is running, the reviewer captures screenshots at 375px, 768px, 1280px, and 1920px. If not, it produces a code-only review based on the diff and announces that at the top of the report.
To enable Playwright MCP:
claude mcp add playwright -- npx @playwright/mcp@latestOutput format:
## Design Review: [feature or route name]
### Verdict
[PASS | NEEDS CHANGES | FAIL]
### Aesthetic match
[1–2 sentences. Quote specific DESIGN.md lines.]
### Violations
- [file:line — specific violation, with the rule it breaks]
### Diff suggestions
- `path/to/file.tsx:23` — replace `bg-zinc-900` with `bg-[--color-bg-elevated]`
- `app/page.tsx:88` — body leading is 1.4; DESIGN.md specifies 1.6 for body
- Hero animation duration is 600ms; DESIGN.md ceiling is 400ms
### State coverage
[List of states verified or missing per component touched.]
### Stress test results
[If interactive. Otherwise omit.]
### Screenshots captured
[List of viewports successfully captured, or note if Playwright was unavailable.]architecture-reviewer#
File: .claude/agents/architecture-reviewer.md after bootstrap.
Anchor: docs/ARCHITECTURE.md. If the file doesn't exist, the reviewer:
- Says so at the top of the report.
- Asks whether the project is genuinely architecture-light (static site, simple CLI, library) or whether ARCHITECTURE.md should exist.
- For architecture-light projects, produces a code-only sanity pass. Flags any obvious cross-cutting concerns introduced (auth, persistent storage, payments, queues) that should trigger a real ARCHITECTURE.md.
What it checks (the rubric):
- Stack alignment. Do imports / config / dependencies match §1? Flag any new library that wasn't authorized. Verify against the spec's
Allowed packagesline. - Data model. Do new entities / FKs / columns / indexes / migrations match §2? Identity strategy consistent, multi-tenancy column present, soft-delete handling correct, time fields correct, ON DELETE behavior matches.
- Service shape. Module boundaries (§3) respected? No reaching across boundaries. New API surface introduced without an ARCHITECTURE.md update is a violation.
- Cross-cutting concerns. Auth at the documented enforcement layer. Errors thrown from defined classes. Logging matches the format. Caching obeys §4. Background work registered with the documented queue + idempotency strategy. Secrets fetched via the documented mechanism.
- Migration & rollout. Forward-only vs expand-contract pattern matches §4 strategy. Backfill stated and reversible. Rollback story described. Migration is idempotent.
- Evolution / bets. Does the change conflict with a stated bet in §5? E.g. "users won't exceed 10k rows per tenant" — does the new query break if they do?
- Trade-off log freshness. Does an architecturally material change have a §6 entry? If not, the deviation is undocumented even if it's right.
Output format: standard verdict + sections + 🔴 / 🟡 / 🟢 findings + suggested diffs or trade-off log entries.
A representative finding:
🔴 BLOCKING
- src/server/queries/getUserDashboard.ts:14 — query lacks `tenant_id` filter.
ARCHITECTURE.md §2 commits to row-level multi-tenancy on every query path.
Fix: add `where(eq(users.tenantId, ctx.tenantId))` or use the project's
scoped-query helper.migration-reviewer#
File: .claude/agents/migration-reviewer.md after bootstrap.
Lane: This reviewer's sole concern is production safety of this migration. Strategic questions ("should we even use Postgres?") belong to architecture-reviewer. Code quality belongs to pre-commit-review. Visual review belongs to design-reviewer. The migration-reviewer stays in lane.
Anchor: docs/ARCHITECTURE.md §1 (stack — what database, what version, what host), §4 (Migration & deploy strategy), §5 (Bets — especially scale assumptions). If absent, the reviewer asks for the database engine + version explicitly. Lock-impact rules vary across Postgres, MySQL, SQLite, Cloud Spanner.
Per-statement DDL review:
- Lock acquisition. For each DDL, name the lock acquired (ACCESS EXCLUSIVE, SHARE, ROW EXCLUSIVE, none) and whether it blocks reads, writes, or both.
- Backfill cost. Row count estimate, batch strategy, idempotency, default-value cost on column add (volatile vs constant defaults have different lock profiles in modern Postgres).
- NOT NULL adds. Direct
SET NOT NULLon a populated column = 🔴. Safe pattern: nullable → backfill → CHECK NOT VALID → VALIDATE CONSTRAINT → optionally SET NOT NULL. - FK adds.
NOT VALIDthen separateVALIDATE CONSTRAINT. Index on referencing column required. ON DELETE behavior matches §2. - Index hygiene. Every FK has an index.
CONCURRENTLYon hot Postgres tables. No redundant indexes. Composite index column order matches actual query patterns. - Rename safety. Single-step renames are deploy-time race conditions unless the application is offline. Safe pattern: expand-contract.
- Transaction wrapping.
CREATE INDEX CONCURRENTLYcannot be in a transaction. Long backfills shouldn't hold a transaction open. - Rollback story. Reversible / forward-only / irreversible — must match what the spec declares.
- Data integrity at the boundary. CHECK constraints with
NOT VALIDthenVALIDATE. Triggers idempotent. Enum changes safe. - Cross-tenant exposure. Multi-tenant projects: every new table needs
tenant_id+ index, every new query path needstenant_idin theWHEREclause, RLS policies (if used) added in the same migration.
The output is shaped to be actionable:
🔴 BLOCKING
- db/migrations/20250320_add_org_id.sql:7 — `ALTER TABLE users ADD COLUMN org_id UUID NOT NULL DEFAULT gen_random_uuid()` will rewrite the table under ACCESS EXCLUSIVE for the duration. ARCHITECTURE.md §5 bet: "we expect 100k+ rows in users by P3" — this will lock for several minutes.
Suggested rewrite (split into 4 migrations):
20250320_a_add_org_id_nullable.sql
ALTER TABLE users ADD COLUMN org_id UUID;
20250320_b_backfill_org_id.sql
-- batch in 5k chunks via app code or pg_cron
20250320_c_add_check_not_valid.sql
ALTER TABLE users ADD CONSTRAINT users_org_id_not_null CHECK (org_id IS NOT NULL) NOT VALID;
20250320_d_validate_check.sql
ALTER TABLE users VALIDATE CONSTRAINT users_org_id_not_null;The reviewer always provides copy-paste-ready safer DDL for each 🔴 finding, including the multi-step pattern split into separate migrations if needed.
api-reviewer#
File: .claude/agents/api-reviewer.md after bootstrap.
Lane: Per-endpoint completeness and safety. Strategic API design ("REST or RPC?") belongs to architecture-reviewer. Schema changes to migration-reviewer. Code quality to pre-commit-review. Visual review to design-reviewer. The api-reviewer stays in lane.
Anchor: docs/ARCHITECTURE.md §3 (API style, error envelope, pagination, idempotency, versioning) and §4 (auth & authorization, error handling, rate limiting, logging). If absent, the reviewer falls back to universal rules (authorization granularity, input validation, mass-assignment safety, webhook signature verification, open-redirect/SSRF safety, status-code correctness) and recommends running architecture-md-builder.
The most common 🔴 finding in AI-generated APIs isn't "missing auth." It's broken access control with granularity. The endpoint authenticates the user, but it doesn't check whether the user belongs to the resource being queried. Example:
// 🔴 BLOCKING — missing granularity check
export async function GET(req: Request, { params }: { params: { orgId: string } }) {
const user = await getCurrentUser();
if (!user) return new Response("Unauthorized", { status: 401 });
const org = await db.org.findUnique({ where: { id: params.orgId } });
return Response.json(org);
}This authenticates the user, then returns any org the user requests by ID. The fix is verifying membership:
// ✅ Safe
const membership = await db.orgMember.findFirst({
where: { orgId: params.orgId, userId: user.id },
});
if (!membership) return new Response("Forbidden", { status: 403 });The full rubric:
- Authorization granularity. Auth check verifies the user belongs to the specific resource. Not just that the user exists.
- Multi-tenant filtering. Every query in the handler filters by
tenant_id(or equivalent) if the project commits to row-level multi-tenancy. - Role checks. Admin-only endpoints verify role, not just authentication.
- Internal-only endpoints rejected at the platform layer, not by URL obscurity.
- Input validation. Schema-validated (Zod / Yup / Valibot). Mass-assignment safe — no
db.update({ ...req.body }). String length limits. Array bounds. - Idempotency. POST endpoints with external side effects (charging, email, third-party API) need idempotency keys.
- Rate limiting. Per-route, per-user, scoped where ARCHITECTURE.md §4 says.
- Error envelope. Matches the shape documented in §3 / §4.
- Status codes. 201 on create, 422 (or 400) on validation, 401 vs 403, 429 with
Retry-After, 409 on conflict. - Response shape consistency. Pagination format consistent. No leaked internal fields (
password_hash,internal_notes). - URL safety. Open redirect — any user-supplied URL in
Locationorredirect()validated against an allowlist. SSRF — any user-supplied URL fetched server-side validated; private IP ranges disallowed. - Webhook handlers — signature verification before parsing, replay protection, event-ID idempotency, async-safe (queue work >5s).
- Audit / logging. Mutation handlers log actor + action + resource ID per §4.
🔴 findings block the kickoff verification gate. The reviewer produces copy-paste-ready safer code in the project's existing style.
When to skip a reviewer#
The reviewers exist to catch specific failure modes. Some changes don't trigger any of them:
- Pure refactor with no behavior change. No design, architecture, migration, or API surface to review. Tests pass + pre-commit-review covers it.
- Doc edits. None of the four reviewers apply.
- Config tweaks that don't touch the runtime. Same.
Don't run a reviewer just because the kit ships one. Run the reviewer whose lane the change is in.
When in doubt, the kit's CLAUDE.md template gives default rules:
After any UI change, invoke the
design-reviewersubagent before considering the work done.After any architecturally-loaded change (schema, service boundaries, auth, caching, queues, migrations, public APIs, new cross-cutting libraries), invoke the
architecture-reviewersubagent before considering the work done.After any database schema migration is added or modified, invoke the
migration-reviewersubagent before considering the work done.After any HTTP endpoint, server action, RPC handler, or webhook receiver is added or modified, invoke the
api-reviewersubagent before considering the work done.
These rules ship in the bootstrapped CLAUDE.md, so the parent agent reads them every session.
Why a reviewer might say PASS when the work is actually bad#
A reviewer can only catch what's anchored against a written rule. If your DESIGN.md is sparse, design-reviewer has nothing to anchor regression against. If your ARCHITECTURE.md doesn't commit to multi-tenancy, architecture-reviewer can't flag missing tenant filters as 🔴.
When a reviewer says PASS but the work feels off:
- The doc is too vague. Sharpen via
design-md-builder/architecture-md-builder//forbid/architecture-review. - The reviewer's rubric is missing a check that matters for your project. Edit the subagent's markdown file directly. They're just files. Chapter 13: Extending the kit covers extending review rubrics.
Building your own reviewer#
The four shipped reviewers are templates. If your project has a domain-specific concern that's not covered — a permissions model, a billing pipeline, a particular library's pitfalls — write your own.
A reviewer subagent is a markdown file in .claude/agents/<name>.md with:
- A frontmatter block —
name,description(when to invoke it),tools(typically read-only: Read, Grep, Glob, Bash for git diff),model. - A role statement — "You are a senior X doing read-only review."
- A lane statement — what's in scope and what isn't.
- A workflow — read these files, identify what changed, apply this rubric.
- A rubric — specific checks with severity rules.
- An output format — typically verdict + section breakdowns + 🔴 / 🟡 / 🟢 findings + suggested diffs.
- A what-you-never-do list — guardrails: never edit, never approve 🔴 findings, never use vague feedback, stay in lane.
Copy one of the shipped reviewers (api-reviewer.md is the most thorough template) and adapt the rubric. Chapter 13 walks through this.
Continue#
Next: Chapter 11: Token discipline goes deep on one specific check that runs across all UI work: semantic-token enforcement.