Spec-driven workflow
80% of agentic-coding pain comes from skipping spec or plan. Each stage of a structured workflow produces an artifact, and each artifact gets reviewed. This page walks 6 stages with templates you can copy.
For the workshop's personal site, Antigravity (and Gemini CLI) handle spec → plan → execute well enough natively — a single good prompt does the work. This 6-stage pipeline pays off when you move past one prompt: week-long features, multi-PR refactors, work where reviewers and stakeholders need a paper trail. Several open-source frameworks codify this same workflow as installable skills, templates, and review gates — see the spec-driven AI development frameworks on the resources page (Superpowers, GitHub Spec Kit, BMAD-METHOD).
The 6 stages
- Brainstorm — clarify the problem before designing
- Spec — write down what we're building and why
- Plan — break it into 2–5 minute tasks with file paths
- Execute — make the changes, one task at a time
- Review — check the work against the spec
- Finish — merge, tag, deploy, close the loop
Stage 1 · Brainstorm
Goal: clarify the problem before designing the solution.
Ask the agent to do this before proposing code. The output is a markdown doc, not an implementation.
I want to build <rough idea>. Don't design or code yet.
Help me clarify by asking 5–8 questions about:
- The user (who, why, when do they show up)
- The constraints (technical, time, budget, dependencies)
- The non-goals (what we are NOT solving)
- The success measure (how we know it worked)
After my answers, summarise the problem in one paragraph.
Stage 2 · Spec
Goal: a written description of what we're building, why, and what success looks like. Lives in a markdown file in the repo.
# Spec: <feature name>
## Problem
<1–2 paragraphs · who has the problem, why now>
## Solution
<1 paragraph · the chosen approach in plain English>
## Out of scope
<3–5 bullets · what we are NOT building this round>
## Acceptance criteria
- [ ] <observable behaviour 1>
- [ ] <observable behaviour 2>
- [ ] <observable behaviour 3>
## Open questions
<1–3 bullets · things you'd want answered before merging>
Stage 3 · Plan
Goal: break the spec into tasks small enough that you'd estimate each in minutes, not hours. Each task has a file path or two and a clear "done" condition.
# Plan: <feature name>
## Task 1 · <short verb-phrase>
- Files: `path/to/file.ts`
- Change: <1–2 sentences>
- Done when: <observable test passes / output matches>
## Task 2 · <…>
- Files: …
- Change: …
- Done when: …
## Task N · <deploy / verify / cleanup>
…
Rule of thumb: if a task takes more than 5 minutes for the agent to execute, split it. Smaller tasks → tighter feedback loop → faster recovery if it goes wrong.
Stage 4 · Execute
Goal: implement the plan, one task at a time, with verification after each.
The agent can drive this loop, but stay in the feedback loop:
- Read the diff after each task before approving
- Run the test the task says it should pass
- If a task fails twice, see Recovery toolkit
Stage 5 · Review
Goal: catch what the executor missed. The reviewer reads the diff against the spec — not just the code on its own.
Review the diff for <feature> against the spec at <path>.
Check specifically:
- Does each acceptance criterion in the spec have a corresponding test or observable behaviour?
- Anything in the diff that's NOT covered by the spec? Flag it.
- Anything missing for the spec to be considered done? Flag it.
- Code-quality red flags: untested error paths, hard-coded values that should be config, etc.
Output a short report: must-fix, should-fix, nice-to-have.
Stage 6 · Finish
Goal: close the loop. Merge, tag, deploy, write the changelog entry, post the announcement. Update GEMINI.md if you learned something durable.
The final stage is small but skipping it means changes don't actually ship — and "looks done, isn't done" is exactly the failure mode the spec was supposed to prevent.
The 6-stage checklist
Before merging anything, run this:
- [ ] Spec exists in the repo
- [ ] Plan exists and was followed (or updated when reality diverged)
- [ ] Every acceptance criterion has a passing test or verified manual check
- [ ] Diff was reviewed against the spec, not just on its own
- [ ] Changelog / README updated if user-visible behaviour changed
- [ ] Project memory (
GEMINI.md) updated if a new convention or constraint emerged