Agent Workflow Kit
Chapter 09
Technical engraving of a circular paper feed wheel with input, review, and output trays
Fig. 09The daily loop as clockwork.

09 — The daily loop

When to read this: Once your backlog is triaged. The chapter you'll come back to most often. It's the rhythm of shipping work day-to-day.

The four-step loop#

Once setup is done, every shipped feature goes through the same loop:

   ┌──────────────────────────────────────────────────┐
   │                                                  │
   │   pick                                           │
   │     │                                            │
   │     ▼                                            │
   │   kickoff       (dispatches by spec kind)        │
   │     │                                            │
   │     │   ┌─ kind: ui  → /craft-ui                 │
   │     ├──◄                                         │
   │     │   └─ kind: backend|infra → kickoff-spec    │
   │     │                                            │
   │     ▼                                            │
   │   verification gate                              │
   │   (visual rubric / tests / reviewer)             │
   │     │                                            │
   │     ▼                                            │
   │   ship                                           │
   │     │                                            │
   │     ▼                                            │
   │   (loop)                                         │
   │                                                  │
   └──────────────────────────────────────────────────┘

Plus three escape hatches you reach for mid-loop when something architectural surfaces:

  • /architecture-review for decisions that need framing before code is written.
  • design-reviewer / architecture-reviewer / migration-reviewer / api-reviewer subagents for review during execution.
  • /forbid when you spot a generic regression worth pinning into DESIGN.md.

This chapter walks the steps in order, then covers the escape hatches.

Step 1 — Pick the next task#

/pick-next-task

The skill is read-only. It reads the phased backlog (docs/backlog/phase-*.md), parses spec dependencies, identifies which specs are unblocked (their dependencies are [x]), and surfaces a recommendation.

For solo work, it recommends the single next item. For parallelizable work, it surfaces a parallel-safe set ("you could also work on US-12 in parallel since it has no dependency on the recommended item").

The skill never mutates files. It picks. You decide.

A typical output:

## Recommended next: US-04 (P1)
 
US-04: App computes consecutive-day streak from localStorage
- Phase: P1
- Kind: backend
- Dependencies: US-02 (done)
- Acceptance criteria: 3 items, all unmet
 
Parallel-safe alternatives (no shared state with US-04):
- TASK-07: Add Vitest config (P0, kind: infra)

The skill stops there. Kickoff is the next step.

When the skill says "nothing unblocked"#

If every unstarted spec has at least one blocking dependency that's not done, the cause is usually one of:

  • A [~] spec is sitting open. Finish it.
  • An actual dependency cycle. Resolve by re-triaging or splitting one of the specs.
  • You've shipped everything in V1. Run /prd-revise and consider what's next.

Step 2 — Kick off the spec#

This is where the kind tag matters. The kit dispatches to two different executors based on kind::

spec.kind = ui                  spec.kind = backend|infra
       │                                │
       ▼                                ▼
   /craft-ui <id>                   /kickoff-spec <id>
       │                                │
       ▼                                ▼
   Visual review                    Tests-pass gate
   (4 viewports, rubric)            (+ smoke for infra)

/kickoff-spec <id> — for backend and infra#

/kickoff-spec TASK-02

The skill validates the spec (does it exist? does it have a kind tag? are its dependencies met?), flips its status from [ ] to [~] (in progress), then dispatches to the actual executor: usually the main agent in your session.

The executor reads the spec's TASK and CONSTRAINTS, looks at the existing code, and implements the change. When done, it runs the verification gate:

  • For kind: backend: tests pass.
  • For kind: infra: tests pass + smoke check (typically "the app starts and core paths work").

If the gate fails, the spec stays at [~] and the executor reports what failed. You decide whether to fix forward or revert.

If the gate passes, the spec is ready to ship.

/craft-ui <description> — for UI#

/craft-ui is a multi-phase workflow for UI work. Not just an executor. A taste-first workflow that forces the agent to commit to specifics before writing code. The 9 phases:

PhaseWhat it does
0Read DESIGN.md, CLAUDE.md, the design-md-builder skill (if DESIGN.md is missing), token files, existing components. Stop if DESIGN.md is missing.
1Classify the task (SaaS / app UI vs marketing / landing vs design system / component). Different rules apply.
2Brief — up to 5 clarifying questions, only the ones the spec doesn't already answer.
3Aesthetic commitment. Name the direction (e.g. "editorial brutalism"), 3 IS / 3 IS NOT adjectives, dominant color move, type contrast, motion temperament. Do not write code until this phase is done.
4Forbidden defaults — banned for this task unless DESIGN.md explicitly overrides.
5Information architecture — outline before JSX.
6Token alignment — every visual decision maps to a token.
7Build. State coverage is mandatory: hover, focus-visible, active, disabled, loading, error, empty.
8Visual review loop. Screenshot at 4 viewports, apply rubric, iterate.
9Hand-off — summary, new tokens, follow-ups.

You invoke it like:

/craft-ui hero section for the marketing page

The agent walks through the 9 phases. At Phase 8 (visual review), it invokes the design-reviewer subagent (or instructs you to start a dev server if Playwright MCP isn't available) and iterates until the rubric clears.

Phase 3 (aesthetic commitment) is the single most important. If the agent skips it and starts coding, the work drifts toward generic. Phase 3 is mandatory. /craft-ui won't proceed past it without a written commitment.

Why two executors?#

UI and backend have different verification gates that aren't substitutable. A passing test suite proves backend logic is correct. It proves nothing about whether the UI is good. A clean visual rubric proves the UI matches DESIGN.md. It proves nothing about whether the API behind it is sound.

/kickoff-spec runs the tests-pass gate. /craft-ui runs the visual-rubric gate. They share status mechanics (flipping [~] and [x]) but the actual verification differs.

Chapter 10 covers verification gates in depth.

Step 3 — Verification#

The gate is non-substitutable. You can't ship a UI spec on a passing test suite alone. The kit doesn't only run the kind's primary gate. It also surfaces secondary reviewers when relevant.

After UI work — design-reviewer#

The design-reviewer subagent is the kit's read-only design critic. It reads DESIGN.md, identifies what changed, takes screenshots at 4 viewports (375 / 768 / 1280 / 1920), and applies the rubric:

  • Aesthetic match — does the work reflect the named direction in DESIGN.md?
  • Forbidden defaults — Inter, Roboto, purple-on-white, Material easing, decorative hover scales.
  • Token discipline — no hex literals in className, no raw Tailwind color utilities.
  • Type contrast — display vs body distinguishable at a glance.
  • Spatial rhythm — spacing values from the scale.
  • State coverage — hover, focus-visible, active, disabled, loading, error, empty.
  • Motion sanity — durations under 400ms, only transform and opacity animated, prefers-reduced-motion respected.

The output is a structured report with verdict (PASS / NEEDS CHANGES / FAIL) and specific diff suggestions:

🔴 BLOCKING
- app/page.tsx:23 — replace `bg-zinc-900` with `bg-[--color-bg-elevated]`
- app/page.tsx:88 — body leading is 1.4; DESIGN.md specifies 1.6 for body
- Hero animation duration is 600ms; DESIGN.md ceiling is 400ms

The reviewer never edits code. You apply the diffs and re-invoke the reviewer. Repeat until PASS.

After architectural changes — architecture-reviewer#

When a spec touches schema, service boundaries, auth, caching, queues, migrations, or public APIs, invoke the architecture-reviewer subagent:

have the architecture-reviewer check this

Or via the Agent tool directly. The reviewer reads ARCHITECTURE.md, identifies what changed in the diff, and applies a rubric:

  • Stack alignment — do new packages align with §1?
  • Data model — do new entities/FKs/columns/migrations match §2?
  • Service shape — does the change respect §3 module boundaries?
  • Cross-cutting concerns — auth at the documented enforcement layer, errors from defined classes, logging in the documented format, caching obeying §4 rules.
  • Migration & rollout — pattern matches §4 strategy, backfill stated, rollback story present.
  • Evolution / bets — change doesn't conflict with §5 bets.
  • Trade-off log freshness — material changes are logged in §6.

Severity: 🔴 BLOCKING / 🟡 NEEDS DECISION / 🟢 ADVISORY.

After schema migrations — migration-reviewer#

For any DDL — added column, new index, FK addition, NOT NULL toggle, rename — invoke the migration-reviewer subagent. It does per-statement review:

  • Lock acquisitionACCESS EXCLUSIVE lock risks, blocking writes vs reads.
  • Backfill cost — row-count estimate, batch strategy, idempotency.
  • NOT NULL adds — must use the safe pattern (add nullable → backfill → CHECK NOT VALID → VALIDATE → SET NOT NULL).
  • FK addsNOT VALID then VALIDATE CONSTRAINT, index on referencing column, ON DELETE behavior.
  • Index hygiene — every FK has an index, CONCURRENTLY on hot tables.
  • Rename safety — single-step renames are deploy-time race conditions; expand-contract pattern instead.
  • Transaction wrappingCREATE INDEX CONCURRENTLY can't be inside a transaction.
  • Rollback story — reversible / forward-only / irreversible.
  • Multi-tenant exposure — every new table needs tenant_id if the project is multi-tenant.

The reviewer cross-references ARCHITECTURE.md §1 (database + version) and §4 (migration strategy). 🔴 findings block the kickoff verification gate.

After API changes — api-reviewer#

For any new or modified HTTP endpoint, server action, RPC handler, or webhook receiver, invoke the api-reviewer subagent. It does per-endpoint review:

  • Authorization granularity — auth check exists and verifies the user belongs to the resource (catches BOLA / IDOR).
  • Multi-tenant filtering — every query filters by tenant_id if the project is multi-tenant.
  • Input validation — schema-validated, mass-assignment safe, no as any shortcuts.
  • Idempotency — POST endpoints that side-effect externally (charges, emails, third-party calls) need idempotency keys.
  • Rate limiting — per-route, per-user, scoped where documented.
  • Status codes — 201 on create, 422 (or 400) on validation, 401 vs 403, 429 with Retry-After.
  • Webhook handlers — signature verification before parsing, replay protection, event-ID idempotency, async-safe.
  • URL safety — open redirect, SSRF.

🔴 findings block the kickoff verification gate. The most common 🔴 finding in AI-generated APIs is broken access control with granularity: auth is present, but it doesn't verify the user owns the resource being queried.

Chapter 10 covers all four reviewers in depth.

Step 4 — Ship#

/ship-spec US-04

The skill:

  1. Runs a code review — typically /pre-commit-review or equivalent, focused on code quality, naming, formatting, and tests. (This is different from the architecture / migration / api / design reviewers — those check shape; pre-commit-review checks style.)
  2. Pauses for human merge confirmation. Ship-spec deliberately doesn't auto-merge; you confirm before the merge mechanics fire.
  3. Executes merge mechanics — for worktree-based workflows, that's rebase + push from the worktree; for direct-on-main workflows, it's a push from your branch.
  4. Cleans up — closes the worktree if applicable, flips the spec status from [~] to [x], increments the phase's done count.

The spec is now done. Loop back to step 1.

/ship-followup (optional)#

If /ship-spec surfaced deferred items (🟡 NEEDS DECISION findings, FILES TOUCHED deviations, operational chores, workflow gaps), /ship-followup processes them with per-item confirmation: fix in place, file as new inbox item, or flag for human review.

Mid-loop escape hatches#

Sometimes you're partway through a spec and something architectural or visual surfaces. Stop, frame the decision, then continue.

/architecture-review for in-flight architectural decisions#

If you're executing a backend spec and realize the spec didn't actually decide whether to use Redis or in-process caching, stop and run:

/architecture-review

Or by skill name:

architecture-review

The skill frames the decision (often by reframing it — your phrasing usually hides the real question), anchors against existing ARCHITECTURE.md commitments, surfaces trade-offs, recommends a path, and appends to the Trade-off log on your approval.

If existing commitments resolve the question, the skill stops in 30 seconds: "ARCHITECTURE.md §1 already commits to in-process for V1."

Chapter 06 covers /architecture-review in depth.

Subagent reviewers as you go#

You don't have to wait until kickoff verification to invoke the reviewers. If you've just finished writing an API endpoint, invoke api-reviewer immediately:

have the api-reviewer check the new endpoints

Catches drift earlier and shortens the fix loop. The kit's CLAUDE.md template encourages this:

After any HTTP endpoint, server action, RPC handler, or webhook receiver is added or modified, invoke the api-reviewer subagent before considering the work done.

Invoke them via the Agent tool directly. No behavioral difference between user-triggered and agent-triggered invocations.

/forbid when you spot a regression#

If during a /craft-ui run you notice the agent reaching for a pattern that doesn't fit, and DESIGN.md doesn't already forbid it, capture it:

/forbid hover scales over 1.02 — decorative, not communicating state

The slash command appends the rule to DESIGN.md's project-specific forbidden section. Future /craft-ui and design-reviewer runs will catch it.

Chapter 07 covers /forbid in depth.

Anti-patterns in the daily loop#

Anti-patternCost
Kicking off a spec without kind: triageWrong execution lane runs; UI specs miss the visual gate, backend specs miss the tests gate
Shipping without invoking the secondary reviewers (architecture, migration, api)Drift compounds. By month 3, every architectural decision needs reverse-engineering
Batching multiple specs into one kickoffVerification becomes ambiguous. One failed gate could be from any of the batched changes
Skipping /craft-ui Phase 3 (aesthetic commitment) for "small" UI changesThe agent fills the silence with generic defaults. "Small" UI changes are where regression hides
Manually flipping spec status [~] or [x] instead of using kickoff/shipPhase counts silently drift. /pick-next-task and prd-revise start lying
Running prd-revise mid-specProduces context noise during implementation. Wait for a quiet moment
Pushing back on a 🔴 finding from a reviewerEither the change is wrong or the rule is wrong. Both require deliberate action. Never silent acceptance

A representative day#

A real day in the loop, condensed:

Morning.

  • /pick-next-task recommends US-04 (compute streak, kind: backend, P1).
  • /kickoff-spec US-04 flips the status, the executor implements it.
  • Tests pass. No schema change, no new API endpoint. No reviewer needed beyond the tests-pass gate.
  • /ship-spec US-04 runs pre-commit-review, asks for merge confirmation, pushes, cleans up.

Late morning.

  • /pick-next-task recommends US-05 (streak shown above today's entry, kind: ui, P2).
  • /craft-ui US-05 walks Phase 0–9. Phase 3 commits to "editorial restraint, large numerals, monospace for the streak count, no animation." Phase 7 builds. Phase 8 runs design-reviewer at 4 viewports — surfaces 🟡 finding: "streak number uses Newsreader, but DESIGN.md commits monospace for numerals." Fix, re-review, PASS.
  • /ship-spec US-05 ships.

Afternoon.

  • New idea surfaces in conversation: CSV export. Not in PRD. Capture immediately:
    • /backlog-intake The user mentioned CSV export, eventually
    • The idea is now in the Inbox. Won't disrupt current work.
  • /pick-next-task shows two more parallel-safe items in P2. Pick one.
  • Kickoff, execute, ship.

End of phase.

  • All P2 specs are [x]. P2 just shipped.
  • /prd-revise surfaces drift: shipped streak numerals in monospace, but PRD didn't specify. Suggests adding to PRD §3. Approve.
  • Run /backlog-triage on the CSV export inbox item. Gets tagged kind: backend, sized, scheduled to a deferred phase since it's not urgent.

That's the rhythm. The setup pays off in not having to think about which docs to update or which gate to run. The kit makes those decisions mechanical.

Continue#

Next: Chapter 10 goes deep on the verification gates that make the loop trustworthy.