Agent Workflow Kit
Example 01
Technical engraving of a pomodoro timer, project cards, a task ledger, and a small interface frame
Example Fig. 01A small app through the full loop.

Worked example — Pomodoro Tracker

What this is: A complete narrative walkthrough of using the kit to build a small project end-to-end. Use it to ground the abstract concepts from chapters 01–13 in concrete steps. Names, answers, and outputs are illustrative. They show shape, not the only correct answers.

The project: Pomodoro Tracker, a tiny single-user web app that lets a focused worker run pomodoro sessions and see their daily/weekly focus time. No auth, no backend (yet), just localStorage. One person uses it: the builder.

The walk-through covers:

  1. Bootstrap.
  2. PRD via prd-grill.
  3. Skipping ARCHITECTURE.md (this project doesn't need it yet).
  4. DESIGN.md via design-md-builder.
  5. ROADMAP draft.
  6. Backlog seed + triage.
  7. First spec kicked off via /craft-ui.
  8. Visual review and ship.

1. Bootstrap#

mkdir pomodoro-tracker && cd pomodoro-tracker
git init
agent-workflow

Output ends with the "Bootstrap complete" banner. Commit:

git add . && git commit -m "kit: bootstrap"

2. PRD via prd-grill#

Open Claude Code in the project. Type:

prd-grill

The skill walks through six phases. Below is an abbreviated transcript of how it might go:

Skill: I see no PRD or existing code. Greenfield. Let's start at Phase 1: who is V1 for? Not "everyone" or "people who want X" — name a single archetype, ideally based on a real person.

You: Me. Solo founder, work in 25-minute focused blocks, currently use the system Pomodoro app on macOS but it doesn't track history.

Skill: Good archetype. The trigger moment is probably: "I just finished a focused block and want to log it without breaking flow." Does that fit?

You: Yes.

Skill: What's frustrating about the current system app?

You: It runs the timer fine but there's no record. I have no idea if I did 4 pomodoros today or 8. I want a streak record so I can see "12 days, 6 pomodoros average" the way Strava shows running streaks.

Skill: That's the problem statement, and it's specific. Phase 2 — describe the problem in your own words. Drafting: "Solo founder runs pomodoro sessions through the macOS system app. There's no historical record, so they have no sense of consistency over weeks. They want a Strava-like surface that shows daily/weekly focus time and rewards consistency through streaks." Confirm or push back?

You: Confirm. Streak is the key word.

Skill: Phase 3 — V1 capabilities. Each phrased as something the user can do. I'll propose: 1) Start a 25-minute timer, 2) Pause/cancel mid-session, 3) Mark the session complete, 4) See today's count + this-week's total, 5) See current streak. What's missing or wrong?

You: Cut #2. If you cancel mid-session it doesn't count, and that's it. No paused state.

Skill: Sharper, four capabilities. Phase 4 — what does V1 NOT do? This is the longest phase by design. We'll walk through every adjacency one by one.

[Long sequence of one-question-at-a-time non-goal interrogation. Output below.]

Skill: Phase 5 — success metrics. What does success look like? Not "users love it" — measurable.

You: I use it 5 days a week for 4 weeks straight. That's it. If I bounce off it after 3 days, V1 didn't work.

Skill: Behavioral, threshold + timeframe. Counter-metric? What would tell you the app is gaming the streak at the cost of usefulness?

You: If I notice myself starting a pomodoro just to keep the streak going when I'm not actually doing focused work, that's a fail.

Skill: Phase 6 — constraints. Stack opinions?

You: Vite + React + TypeScript. Tailwind v4. localStorage only. No backend, no analytics, no telemetry. Single user.

After the interrogation, the skill writes docs/PRD.md. The relevant sections:

# Pomodoro Tracker — Product Requirements
 
## 1. Primary user
 
A solo founder (me) who works in 25-minute focused blocks throughout the day.
Currently uses the macOS system Pomodoro app, which runs the timer but stores no history.
At the end of a focused block, the trigger moment is "log this without breaking flow."
Hard adoption constraint: must work entirely client-side, no auth, no signup.
 
## 2. Problem
 
The user runs pomodoro sessions through the system app. There is no historical record,
so they have no sense of consistency over weeks. They want a Strava-like surface that
shows daily/weekly focus time and rewards consistency through streaks. Specifically:
last week they did roughly 30 pomodoros across the week but couldn't say so without
counting from memory; the system app didn't help.
 
## 3. V1 capabilities
 
1. Start a 25-minute timer.
2. Mark the session complete (or it auto-marks at 0:00).
3. See today's count and this-week's total, in real time.
4. See the current streak (consecutive days with at least one completed session).
 
## 4. V1 non-goals
 
- ❌ Sub-25-minute or custom durations. *Reason: the brand is the standard pomodoro;
  custom durations dilute the constraint that makes the technique work.*
- ❌ Multi-session pomodoros (work / short break / long break cycles). *Reason: V1
  is just sessions; cycle support is post-V1 and might never matter.*
- ❌ Pause / resume mid-session. *Reason: cancel-and-restart is the discipline. If
  you can't focus 25 minutes, you bail and try again — that's the data.*
- ❌ Multi-user / sharing / leaderboards. *Reason: this is a personal tool. Social
  pressure is not the motivation we're building for.*
- ❌ Cloud sync. *Reason: localStorage is enough for V1. Sync is a 2-week project
  that probably doesn't ship.*
- ❌ Notifications, push, native. *Reason: web app, runs in a tab.*
- ❌ Background timer when tab is inactive. *Reason: complicates V1 substantially.
  V1 requires the tab to be active. If that's a problem, V2.*
- ❌ Analytics, telemetry, error tracking. *Reason: no third-party tools, period.*
- ❌ Settings page or preferences. *Reason: nothing is configurable in V1.*
- ❌ Historical view beyond "today" and "this week." *Reason: the streak surface is
  the historical view. A full history list is post-V1.*
 
## 5. Success metrics
 
- **Primary:** I use the app for at least 4 sessions on at least 5 weekdays per week
  for 4 consecutive weeks after V1 ships. Target by week 4.
- **Counter-metric:** If I find myself starting a pomodoro just to keep the streak
  alive (without actual focused work), the app has failed even if usage is up.
 
## 6. Constraints
 
- Stack: Vite + React + TypeScript, Tailwind v4 with @theme inline tokens, localStorage.
- No backend, no auth, no telemetry.
- Single user.
- Anti-patterns: NOT a productivity dashboard with charts. NOT a habit tracker with
  goals. NOT a chess.com-style "rating" system. The streak is the only motivation surface.
 
## Revision log
 
(empty, ready for prd-revise to grow)

Notice how specific the non-goals section is. Solo founders especially want to skip Phase 4 because every "no" feels like a loss. The "no" is the point. Every entry is a tripwire that prevents the agent from silently building the wrong thing later.

Commit:

git add docs/PRD.md && git commit -m "docs: write PRD"

3. Should we run architecture-md-builder?#

The project has:

  • No database (localStorage only).
  • No auth.
  • No payments.
  • No multi-tenancy.
  • No background jobs.
  • No external integrations.

Exactly the kind of project where ARCHITECTURE.md is overkill. Skip it. If you later add cloud sync or a real backend, run architecture-md-builder then.

If you had needed it (say, a Postgres backend), it would have looked something like:

Skill: §1 stack, §2 data model, §3 service shape, §4 cross-cutting concerns... (six phases of interrogation, ending with docs/ARCHITECTURE.md and a Trade-off log).

Chapter 06 walks through it.

4. DESIGN.md via design-md-builder#

design-md-builder

Abbreviated walkthrough:

Skill: Phase 1 — references. Three sites whose visual design you'd steal?

You: Pitchfork (the album review pages, not the home), the old Stripe Press marketing pages, and Gridfinity's product page.

Skill: All three are editorial-leaning, with strong type contrast. Two sites in your space (productivity tools) you actively dislike, and why?

You: Notion's marketing pages — too many emoji, too rounded, too SaaS. And RescueTime — looks like a 2014 dashboard.

Skill: Aesthetic direction proposal: "editorial restraint" — display serif at heavy weight (Newsreader 700 or Fraunces 800), body in a clean grotesque (Cabinet Grotesk 400), one accent color, restrained motion. Confirm or push back?

You: Confirm with Newsreader. I want a touch more warmth than Fraunces.

Skill: Phase 2 — brand identity. 3 adjectives the brand IS?

You: Quiet, confident, durable.

Skill: 3 adjectives the brand IS NOT?

You: Cute, gamified, dashboard-y.

[Continues through type, color, space, motion, voice.]

When the skill is done, DESIGN.md has these key sections (abbreviated):

# Pomodoro Tracker — DESIGN.md
 
## Brand identity
 
**Aesthetic direction:** Editorial restraint.
 
**IS:** quiet, confident, durable.
**IS NOT:** cute, gamified, dashboard-y.
 
**Tagline:** Track focused work without breaking it.
 
**The single thing a visitor remembers in a week:** The streak number rendered in a
display serif at huge size, like a chapter heading.
 
## Type system
 
- **Display:** Newsreader, weights 200 and 700.
- **Body:** Cabinet Grotesk, weight 400.
- **Mono:** JetBrains Mono, weight 400 (for the timer countdown only).
- **Tracking:** -0.02em on display, 0 on body, 0.01em on mono.
- **Modular scale:** 1.414 ratio.
 
## Color (OKLCH)
 
```css
@theme inline {
  --color-background: oklch(0.97 0.005 80);     /* warm off-white */
  --color-foreground: oklch(0.18 0.02 270);     /* near-black with cool undertone */
  --color-card: oklch(0.95 0.01 80);
  --color-border: oklch(0.90 0.005 80);
  --color-muted-foreground: oklch(0.50 0.02 270);
  --color-accent: oklch(0.55 0.15 30);          /* burnt umber, single accent */
}

Motion#

  • Easing: cubic-bezier(0.16, 1, 0.3, 1) (expressive ease-out).
  • Duration tiers: fast 100ms, normal 200ms, slow 350ms.
  • Spring vs duration: durations only. No springs in V1.
  • What animates: transform and opacity only.
  • prefers-reduced-motion: strict respect.

Forbidden defaults#

Universal#

  • ❌ Inter, Roboto, Open Sans, Lato, Arial.
  • ❌ Purple gradients on white.
  • ❌ Three-column feature grids.
  • ❌ Material default easing.
  • ❌ Hover scales over 1.02.

Project-specific rejections#

  • ❌ Tomato emoji or any pomodoro-imagery cliche. Reason: editorial brand, not retail.
  • ❌ Progress bars or circular timers. Reason: too dashboard. Use a numeric countdown.
  • ❌ Cute notifications ("Great job!" "You did it!"). Reason: NOT in the IS list.
 
Commit:
 
```bash
git add DESIGN.md && git commit -m "design: lock in editorial restraint"

5. ROADMAP draft#

In the chat:

Read docs/PRD.md and follow prompts/prompt-roadmap.md to draft docs/ROADMAP.md.

The agent reads the PRD and produces a draft. After one iteration:

# Roadmap — Pomodoro Tracker
 
## P0: Setup
**Acceptance criteria:**
- [ ] Vite + React + TypeScript scaffold initialized.
- [ ] Tailwind v4 installed; DESIGN.md tokens wired into globals.css `@theme inline`.
- [ ] check-tokens.sh passes on a stub component.
- [ ] App shell renders with Newsreader + Cabinet Grotesk loading correctly.
 
**Depends on:** none
 
**Out of scope for this phase:** any pomodoro logic, any storage.
 
## P1: Single-session timer
**Acceptance criteria:**
- [ ] User can start a 25-minute timer.
- [ ] Timer counts down visibly and audibly (one terminal beep at zero).
- [ ] At zero, the session is marked complete and recorded in localStorage.
- [ ] User can cancel a session in progress (does not record).
 
**Depends on:** P0
 
**Out of scope for this phase:** today's count, week total, streak. Just one session
at a time.
 
## P2: History surfaces
**Acceptance criteria:**
- [ ] App computes today's count from localStorage.
- [ ] App computes this-week's total from localStorage.
- [ ] App computes the current streak (consecutive days with ≥1 session).
- [ ] All three values display alongside the timer at all times.
 
**Depends on:** P1
 
**Out of scope for this phase:** historical view beyond the three computed values.
 
## Out of scope for V1
 
- ❌ Sub-25-minute or custom durations.
- ❌ Multi-session cycles (work / break).
- ❌ Pause / resume mid-session.
- ❌ Multi-user, sharing, leaderboards.
- ❌ Cloud sync, notifications, native, background-tab timer.
- ❌ Analytics, telemetry.
- ❌ Settings or preferences.
- ❌ Historical view beyond today / this week / streak.

Commit:

git add docs/ROADMAP.md && git commit -m "docs: draft ROADMAP"

6. Fill CLAUDE.md placeholders#

Open CLAUDE.md. Fill the placeholders with distillations from the PRD:

# Pomodoro Tracker
 
This file is **session memory**. Read it before doing anything else.
 
## What this product actually is
 
A single-user web app for tracking 25-minute pomodoro sessions in localStorage,
with three surfaces: today's count, this week's total, and current daily streak.
Built for personal use; no auth, no backend, no telemetry.
 
## Project Context
 
### Stack
- **Framework:** Vite + React + TypeScript
- **Styling:** Tailwind v4 with `@theme inline` tokens in `src/styles/globals.css`
- **State:** none (local component state + localStorage)
- **Testing:** Vitest + Playwright + `@axe-core/playwright`
 
### Commands
- **Dev:** `npm run dev`
- **Lint:** `npm run lint`
- **Type check:** `npm run typecheck`
- **A11y:** `npx playwright test tests/a11y.spec.ts`
- **Token check:** `bash scripts/check-tokens.sh`
 
[... existing kit-shipped Rules section, kept verbatim ...]
 
## Conventions
 
- localStorage keys are namespaced `pomodoro:` (e.g., `pomodoro:sessions:2025-04-15`).
- Sessions store as `{ startedAt: ISO, completedAt: ISO | null }`.
- All time math goes through one helper at `src/lib/time.ts`.
- No third-party state library; useState + useEffect + localStorage are enough.
 
## What success looks like
 
I use the app on at least 5 weekdays per week for 4 consecutive weeks after V1 ships.
 
## Anti-patterns
 
- ❌ NOT a productivity dashboard. No charts, no graphs, no breakdowns.
- ❌ NOT a habit tracker. No goals, no targets, no week-of-the-year.
- ❌ NOT a "rating" system. No XP, no levels, no badges.
- ❌ NOT cute. No tomato emoji. No cheerful microcopy.

Commit:

git add CLAUDE.md && git commit -m "docs: fill CLAUDE.md"

7. Seed and triage the backlog#

In the chat:

Read docs/PRD.md and docs/ROADMAP.md and follow prompts/prompt-seed-backlog.md to populate the Inbox in docs/backlog.md.

The Inbox now has roughly:

## Inbox
 
### Phase 0 — Setup
- TASK-02: Initialize Vite + React + TypeScript
- TASK-03: Install Tailwind v4 + wire DESIGN.md tokens into globals.css
- TASK-04: Add check-tokens.sh to npm run lint
- TASK-05: Load Newsreader + Cabinet Grotesk + JetBrains Mono via @font-face
 
### Phase 1 — Single-session timer
- US-01: User starts a 25-minute timer from the app shell
- US-02: Timer counts down visibly with mono numerals
- US-03: At zero, session is recorded to localStorage
- US-04: User can cancel a session in progress (does not record)
 
### Phase 2 — History surfaces
- US-05: App computes today's count from localStorage
- US-06: App computes this-week's total from localStorage
- US-07: App computes streak (consecutive days with ≥1 session)
- US-08: All three values display alongside the timer

Run:

/backlog-triage

The skill walks each item:

  • TASK-02 through TASK-05 are tagged kind: infra and moved to docs/backlog/phase-0-setup.md.
  • US-01 (start timer button + visible countdown) is tagged kind: ui (it's primarily UI work, even though there's logic).
  • US-02 is tagged kind: ui.
  • US-03 (data persistence) is tagged kind: backend.
  • US-04 is tagged kind: ui.
  • US-05 through US-07 are tagged kind: backend (computation, not UI).
  • US-08 is tagged kind: ui (display work).

Commit:

git add docs/ && git commit -m "docs: triage backlog"

8. Pick and kick off the first spec#

/pick-next-task

Recommends TASK-02 (initialize Vite). Kick it off:

/kickoff-spec TASK-02

The agent flips the spec to [~], runs npm create vite@latest . (or equivalent), installs dependencies, and reports back with smoke check passing. Run /ship-spec TASK-02. Status flips to [x].

Repeat for TASK-03, TASK-04, TASK-05. By the end of P0, the project shell renders with the right fonts and tokens. check-tokens.sh passes.

9. The first UI spec — /craft-ui US-01#

The first interesting spec: US-01: User starts a 25-minute timer from the app shell. kind: ui, so it goes through /craft-ui.

/craft-ui US-01: Timer start surface from the app shell

The 9 phases play out:

Phase 0 — Context load. The agent reads DESIGN.md, CLAUDE.md, the existing components (just App.tsx), src/styles/globals.css. DESIGN.md is present. Proceed.

Phase 1 — Classify. "SaaS / app UI." Not marketing, not design system.

Phase 2 — Brief. Spec is clear; no clarifying questions needed.

Phase 3 — Aesthetic commitment. This is the load-bearing phase. The agent commits:

  • Aesthetic name: editorial restraint.
  • IS: quiet, confident, durable.
  • IS NOT: cute, gamified, dashboard-y.
  • Dominant color move: burnt umber accent on warm off-white. Body text near-black with cool undertone.
  • Type contrast: Newsreader 700 for the streak number (huge), JetBrains Mono for the timer countdown (medium), Cabinet Grotesk 400 for body and labels. Weight extremes: 200 vs 700.
  • Motion temperament: restrained. 200ms ease-out for state transitions. No springs. No decorative motion.

The agent does not write code yet. It commits in writing, then proceeds.

Phase 4 — Forbidden defaults. The agent restates the forbids: no Inter, no purple gradients, no three-column anything, no Material easing, no hover scales over 1.02, no tomato emoji, no progress bars or circular timers, no cute microcopy.

Phase 5 — Information architecture. The agent outlines the layout before any JSX:

Layout:
  Centered single-column
  Top: large display number (current streak, Newsreader 700, ~144px)
  Below: timer countdown (JetBrains Mono, ~96px, 25:00 by default)
  Below: single START button (Cabinet Grotesk, accent background)
  No visible header, no nav, no footer.

Phase 6 — Token alignment. Every visual decision maps to a token. Streak number → text-foreground. Countdown → text-foreground. Button → bg-accent text-background. No raw literals.

Phase 7 — Build. Now JSX. State coverage: hover (button darkens accent slightly), focus-visible (3:1 outline ring), active (slight scale-down to 0.98), disabled (50% opacity, no pointer events). The component is small but every state is defined.

Phase 8 — Visual review loop. Dev server is running. Agent asks Playwright MCP to screenshot at 375 / 768 / 1280 / 1920. Surfaces these findings via design-reviewer:

🟡 NEEDS DECISION
- src/components/Timer.tsx:18 — START button uses bg-accent which renders as
  burnt umber. The button-text is text-background (warm off-white). Contrast
  ratio at 3:1 minimum — need to verify with axe.
 
🟢 ADVISORY
- The streak number on mobile (375px) is 144px which feels too large for the
  viewport. Consider a clamp(72px, 18vw, 144px) responsive scale.

Apply the advisory diff (adds the clamp). Re-screenshot. Re-review. PASS.

Phase 9 — Hand-off. The agent reports:

## US-01 — Timer start surface — DONE
 
What was built:
- src/components/Timer.tsx — main timer surface
- src/components/StartButton.tsx — single accent button with full state coverage
 
New tokens: none added; consumed existing.
 
Follow-ups (not part of this spec):
- Audio playback (terminal beep at zero) — part of US-02
- Cancel-session interaction — part of US-04
- localStorage persistence — part of US-03

Ship it:

/ship-spec US-01

/pre-commit-review runs (no findings). Human merge confirmation: yes. Push. Status flips to [x]. Phase 1 has 1 of 4 specs done.

10. The rhythm continues#

The next morning, /pick-next-task recommends US-02 (timer counts down with mono numerals). /craft-ui it. By end of week, P1 ships.

P2 (history surfaces) is mostly kind: backend. Computation lives in src/lib/time.ts, called from the UI built in P1. Each backend spec goes through /kickoff-spec with the tests-pass gate. The integration into the UI is one final kind: ui spec (US-08), which goes through /craft-ui.

V1 ships. Start using the app daily.

After 4 weeks of usage, run /prd-revise. Findings:

🟢 No drift detected. Shipped specs trace cleanly to PRD §3 capabilities. PRD §4 non-goals are all still respected (no pause, no cycles, no notifications). Success metric §5 (5+ days/week for 4 weeks) is on track at week 3.

The Revision log gets a clean entry confirming the pass. The PRD didn't lie this time.

What this example demonstrates#

  • The interrogation skills are where the work happens. PRD via prd-grill, brand via design-md-builder. The skills push you past vague answers and force commitments. Once those are written down, every downstream step (ROADMAP, backlog, specs, UI) consults them.
  • Skipping ARCHITECTURE.md is fine when the project is thin. Don't gold-plate. The kit recommends ARCHITECTURE.md only for projects with state and integrations.
  • Spec kinds mechanize the gate. UI work goes through /craft-ui and the visual review gate. Backend / infra goes through /kickoff-spec and the tests-pass gate. You don't have to remember which gate to run. The kit dispatches.
  • Phase 3 of /craft-ui is the most important step. Aesthetic commitment before code. Without it, the agent regresses to generic.
  • prd-revise runs at phase boundaries to keep the PRD honest. A clean pass is a valid outcome. Don't manufacture drift.