Agent Workflow Kit
Example 02
Technical engraving of a design document moving through a calibration jig with type gauges and rule cards
Example Fig. 02A vague design brief sharpened into a spec.

Worked example — Sharpening DESIGN.md

What this is: A before/after walk-through. What a vague initial DESIGN.md produces, what the agent ships against it, and how three rounds of /forbid plus one targeted re-run of design-md-builder sharpen the doc until generic regression stops.

The setup: a small SaaS dashboard called Tideline, used by ops teams to track on-call incidents. The team ran design-md-builder quickly on a Monday morning. The first draft of DESIGN.md is the vague version below.

Round 0: the vague DESIGN.md#

# Tideline — DESIGN.md
 
## Brand identity
 
**Aesthetic direction:** Modern and clean.
 
**IS:** professional, fast, reliable.
**IS NOT:** flashy.
 
**Tagline:** Incident response, simplified.
 
## Type system
 
- **Display:** sans-serif, modern.
- **Body:** sans-serif.
- **Mono:** any monospace.
 
## Color
 
A blue-based palette with neutral grays. Usable in dark and light modes.
 
## Motion
 
Smooth animations on transitions.
 
## Forbidden defaults
 
(empty)

If you read this and think "could describe a different SaaS I've seen before," you're right. That's the unifying test from Chapter 07. This DESIGN.md is too generic to constrain anything.

What the agent ships against the vague version#

The team runs /craft-ui for the dashboard's incident-list page. The agent reads DESIGN.md, can't anchor on anything specific, and fills the silence with training-data defaults.

What ships, in roughly this order:

  • Type: Inter — the most common AI-default font, unambiguously generic.
  • Color: Tailwind blue-600 for the accent, with gray-50 background and gray-900 text. The "blue-based palette with neutral grays" left no constraint to anchor against.
  • Layout: A three-column dashboard with stat cards, each card identically sized, each with a centered icon at the top. Exactly what the universal forbidden-defaults list describes as the "three-column feature grid with identical card heights and centered icons" anti-pattern.
  • Motion: 300ms ease-in-out fade-in on cards as they mount, plus 1.05 hover scale on each card. Material default easing.
  • Backgrounds: Flat gray-50 all the way. No atmosphere, no layered tonality.
  • Microcopy: "Effortlessly track your on-call rotations." "Seamless team coordination." Both of which are forbidden-words from any voice-aware brand.

The output is functional. It also looks like every other ops dashboard from 2022. The team's PM looks at it and says, "this could be Datadog, PagerDuty, or any of a dozen others. What makes it Tideline?"

The technical answer: nothing. DESIGN.md doesn't say. So the agent doesn't know.

Round 1: tighten brand identity#

The team re-runs design-md-builder Phase 2 (brand identity) only, with sharper answers:

Skill: Aesthetic direction — name a specific direction. Apple/Linear restraint? Brutalist? Editorial? Terminal/IDE? Mid-century printed?

Team: Terminal/IDE-inspired. We want the on-call surface to feel like an operations tool, not a marketing site.

Skill: Three adjectives the brand IS, ones you can actually test against output?

Team: Sober, dense, accurate.

Skill: Three adjectives the brand IS NOT?

Team: Cheerful, decorative, marketing-shaped.

DESIGN.md §1 now reads:

## Brand identity
 
**Aesthetic direction:** Terminal / IDE — operational, dense, no marketing aesthetic.
 
**IS:** sober, dense, accurate.
**IS NOT:** cheerful, decorative, marketing-shaped.
 
**Tagline:** Track on-call incidents end to end.
 
**Single thing a viewer remembers in a week:** Severity-color-coded incident rows
that scan like a JetBrains gutter — exact, dense, scannable.

Round 2: tighten type system#

design-md-builder Phase 3:

Skill: Push past Inter. Display family, body family, mono family — all named, all specific.

Team: Display: IBM Plex Sans. Body: IBM Plex Sans (same family, weight contrast 200 vs 700). Mono: JetBrains Mono. We want the IDE feel.

Skill: Tracking?

Team: -0.02em on display, 0 on body, 0 on mono.

Skill: Modular scale ratio?

Team: 1.25. Conservative. We don't want dramatic scale jumps; we want density.

DESIGN.md §3 now reads:

## Type system
 
- **Display:** IBM Plex Sans, weight 200 and 700 only. No 400/600 — weight extremes only.
- **Body:** IBM Plex Sans, weight 400.
- **Mono:** JetBrains Mono, weight 400.
- **Tracking:** -0.02em on display, 0 on body, 0 on mono.
- **Modular scale:** 1.25 ratio. Conservative — density over drama.

The agent now has a clear constraint: IBM Plex Sans, not Inter. The token check (which already forbids Inter via the universal list) will catch any regression.

Round 3: tighten color#

design-md-builder Phase 4:

Skill: OKLCH values for background, foreground, surface, border, muted-foreground, accent. Status colors. Be specific.

Team: Dark mode first — light mode is V2. Background oklch(0.18 0.01 250). Foreground oklch(0.92 0.005 80) (warm off-white text). Card oklch(0.22 0.01 250). Border oklch(0.27 0.01 250). Muted oklch(0.62 0.01 80). Accent: oklch(0.65 0.18 50) (an amber that reads "alert" without being red).

Skill: Status colors?

Team: Severity 1 = oklch(0.55 0.20 25) (deep red). Severity 2 = oklch(0.65 0.18 50) (amber, same as accent). Severity 3 = oklch(0.55 0.10 200) (cool blue). Resolved = oklch(0.50 0.06 150) (muted green).

DESIGN.md §4 now has every color committed in OKLCH, mapped to specific semantic tokens. The agent can no longer reach for bg-blue-600. No blue-* token exists. Just bg-accent, bg-severity-1, bg-severity-2.

Round 4: project-specific forbids via /forbid#

After running the dashboard once with the new tokens, the team observes the agent reaching for patterns that aren't generic enough to be on the universal forbidden list, but still don't fit Tideline:

/forbid centered card icons — terminal/IDE aesthetic doesn't decorate, leading icons go inline
/forbid hover lifts and shadows on rows — creates marketing-card feel; rows should respond with subtle background change only
/forbid "let's get started" or "get started" CTA copy — operational tool doesn't onboard with cheer; use direct verbs like "open incident" or "start rotation"

Each of these gets parsed by /forbid, formatted, and appended to DESIGN.md's Project-specific rejections subsection. The reasons make the rules survive future re-reads.

DESIGN.md §9 (forbidden defaults) now reads:

## Forbidden defaults
 
### Universal (kit-shipped)
- ❌ Inter, Roboto, Open Sans, Lato, Arial, system stacks.
- ❌ Purple gradients on white. Generic SaaS blue/teal.
- ❌ Three-column feature grids with identical card heights and centered icons.
- ❌ Material default easing `cubic-bezier(0.4, 0, 0.2, 1)`.
- ❌ Hover scales for decoration; animations over 400ms for state change.
- ❌ Unmodified shadcn/ui defaults shipped without remixing.
 
### Project-specific rejections
- ❌ Centered card icons. (Reason: terminal/IDE aesthetic doesn't decorate; leading icons go inline.)
- ❌ Hover lifts and shadows on rows. (Reason: creates marketing-card feel; rows should respond with subtle background change only.)
- ❌ "Let's get started" or "Get started" CTA copy. (Reason: operational tool doesn't onboard with cheer; use direct verbs like "Open incident" or "Start rotation".)

Round 5: re-run /craft-ui against the sharpened DESIGN.md#

The team runs /craft-ui for the same incident-list page that originally produced the generic version. The 9 phases now have something to anchor against.

What ships:

  • Type: IBM Plex Sans 200 for the page heading (very thin, wide tracking). IBM Plex Sans 700 for severity labels (very heavy, mono-feel). JetBrains Mono for incident IDs and timestamps.
  • Color: Dark background oklch(0.18 0.01 250). Each row scans with a left-border in the severity color. The accent amber is reserved for the active filter button. No blue-* anywhere.
  • Layout: A single dense table, not a three-column card grid. Rows are 36px tall (dense). Each row has: severity gutter (4px wide), incident ID (mono), one-line title, time-to-resolve (mono), assignee (initials only). Reads like an IDE problems pane.
  • Motion: No card-lift, no scale on hover. Hover changes row background to oklch(0.22 0.01 250) (one step lighter than base) over 100ms. Selection commits with a 1px ring in the accent color. No mounting animations.
  • Backgrounds: Two-tone — base background and slightly lighter card surface. No flat gray-50 field.
  • Microcopy: "Open incident." "Start rotation." "Acknowledge." Direct verbs. No "Let's." No "Effortlessly."

A design-reviewer pass produces:

## Design Review: incident-list
 
### Verdict
PASS
 
### Aesthetic match
Work matches DESIGN.md §1 ("terminal/IDE — operational, dense") closely. The dense
table layout and JetBrains Mono for IDs anchors the IDE feel. Severity gutter borders
give the JetBrains-gutter cue called out in DESIGN.md.
 
### Violations
(none)
 
### Diff suggestions
(none)
 
### State coverage
- Row hover: ✓ subtle bg change to `bg-card`
- Row focus-visible: ✓ 1px ring in `accent`, 3:1 contrast
- Row active: ✓ ring darkens
- Row disabled: ✓ 50% opacity
- Empty state: ✓ shows "No active incidents." in `muted-foreground`
- Loading state: ✓ skeleton rows match real-row dimensions
 
### Stress test results
- Rapid clicks 10x: no shift
- Tab through: focus order matches visual order
- 320px width: row collapses to two-line layout cleanly
- Slow 3G: skeleton shows in <100ms
 
### Screenshots captured
375 / 768 / 1280 / 1920 — all four viewports

The PM now reads the work and says, "this is recognizably Tideline. Would not be confused for Datadog or PagerDuty."

Side-by-side comparison#

ElementVague DESIGN.mdSharpened DESIGN.md
Display font"sans-serif, modern" → Inter"IBM Plex Sans 200/700 only" → IBM Plex Sans
Color scheme"blue-based palette" → blue-600 + gray-50OKLCH dark base + amber accent + 4 severity colors
Layoutthree-column card griddense table with severity gutter
Motion"smooth animations" → 300ms Material easing + 1.05 hover scale100ms ease-out subtle row bg change, no scale, no mount animation
Microcopy"Effortlessly," "Seamlessly""Open incident," "Acknowledge"
Generic test"could be any ops SaaS""recognizably Tideline"

Lessons#

  • Specificity is upstream. The work changed because DESIGN.md changed, not because the agent got smarter. The fight against generic is fought during DESIGN.md authoring. Not during code generation.
  • Specific reasons make rules survive. Each /forbid includes a one-sentence reason. Six months from now, the team can read "centered card icons" and immediately know whether to revisit (the reason: "terminal/IDE aesthetic doesn't decorate"). Without the reason, the rule looks arbitrary and gets relaxed casually.
  • Project-specific forbids fill the gap the universal list can't. The kit's universal forbids catch Inter, Material easing, hover scales: patterns generic enough to be wrong almost everywhere. Project-specific forbids catch patterns that are fine for some brands but wrong for yours. Three to five forbids in the first month is healthy.
  • Re-running design-md-builder on individual sections is normal. You don't have to nail DESIGN.md in one pass. Run it once to get a draft, ship some UI, observe regression, re-run on the sections that produced the regression. The skill diagnoses missing or vague sections and only re-interrogates on those.
  • design-reviewer PASS requires DESIGN.md to be specific enough to anchor against. The PASS verdict in Round 5 was meaningful because there were specific rules to verify against. A PASS against the vague Round 0 DESIGN.md would have been worthless. Nothing to violate.

What to do in your project#

  1. Run design-md-builder once. Don't perfect it.
  2. Ship one UI spec via /craft-ui. Run design-reviewer on it.
  3. Note what the agent reached for that doesn't fit. For each pattern: /forbid <pattern> — <reason>.
  4. If a section of DESIGN.md is the source of regression (not just one pattern), re-run design-md-builder on that section.
  5. Repeat. After ~4 weeks of building, DESIGN.md will be specific enough that generic regression stops surfacing.

The doc is alive. So is your taste. The kit is the scaffolding that connects the two.