Practical Training: Step-by-Step Guide to Using 7 Agents to Upgrade Vibe Coding to an Expert-Level Development Workflow

Author @sairahul1 Dissects the workflow revolution from "Vibe Coding" to "Software Factory": breaking down a single AI conversation into 7 specialized agents: Researcher, Story Writer, Spec Writer, Backend Builder, Frontend Builder, Test Verifier, Implementation Validator, each with a single responsibility, clean context, and strict boundaries.
(Prequel: Can MCP connecting everything plus Web3 become the next wave of AI storytelling a hundredfold?)
(Background supplement: The strongest investment masters work for you! Gathering Buffett, Munger, Cathie Wood… 19 AI Agents to analyze the market)

Table of Contents

Toggle

  • The problem no one talks about
  • Turning point: from Vibe Coding to Software Factory
  • Seven Agents
    • Agent 1: Codebase Researcher
    • Agent 2: Story Writer
    • Agent 3: Spec Writer
    • Agent 4: Backend Builder
    • Agent 5: Frontend Builder
    • Agent 6: Test Verifier
    • Agent 7: Implementation Validator
  • How the entire chain runs
  • Basics: Before agents can operate, you need this
    • CLAUDE.md — Survives in every conversation’s memory
    • Context Drift — The silent killer
  • Results: What truly changes
    • Before factory:
    • After factory:
    • The real transformation:
  • Build your own version this weekend
    • 8-step setup checklist:
  • Quick reference for the seven agents

I thought I was coding with AI. Turns out, I was just typing faster.

What I want to talk about is the difference — and the system that completely changes everything: the "7 Agent System."

Save this article. It will save you several months.

The problem no one talks about

That seemingly productive, but actually ineffective cycle:

→ Ask Claude to make a feature → It produces code → Something breaks → Paste error message back → It patches → Another part breaks → Ask again

Day 1: This feels like magic.

Day 30: You spend more time supervising AI than writing code yourself.

The same logic appears in three different places. Claude forgets the conventions you set two weeks ago. New features break old ones. Testing is either missing or superficial.

One day you wake up and realize: It’s not AI failing, it’s your workflow failing.

The core issue is structural.

When you type "Help me make this feature" in Claude Code, you’re actually asking an AI conversation to play multiple roles simultaneously:

→ Product Analyst → Architect → Backend Engineer → Frontend Engineer → Tester → Code Reviewer

All at once. In one chaotic conversation.

Wrong assumptions in the plan turn into wrong database models. Wrong models become wrong APIs. Wrong APIs lead to wrong UI.

By the time you notice, errors have spread everywhere.

This is what’s called vibe coding (coding by feel).

It hits a hard ceiling.

Turning point: from Vibe Coding to Software Factory

The real key to changing everything:

A true engineering team doesn’t work in a single large conversation.

Different people have different tasks:

→ Clarify user problems → Think about architecture → Write APIs → Build UI → Consider edge cases → Review

When you shrink all these into one AI conversation, errors quietly accumulate.

The fix is to break work into specialized agents.

Each agent gets:

→ A focused task → Its own clean context window → Only the tools it truly needs → Strict rules about what it "must not touch"

Result: A software factory.

One developer + seven focused agents = a coordinated team.

Below are the seven agents that make this work.

Seven Agents

Agent 1: Codebase Researcher

What’s the biggest mistake developers make when using AI?

Treating "getting code" as the first step.

AI takes your prompt, guesses, fills in gaps, and starts generating. Poor design sneaks in at this moment.

The Researcher corrects this.

Its only job: Review the codebase and explain the current state — before a single line is written.

What it does:

  • Mark relevant files and their roles
  • Record existing patterns to follow
  • Find similar functionalities already built
  • Flag risks (time zones, multi-tenancy, retry logic)
  • List which tests need updating

What it cannot do:

  • Edit files (read-only)
  • Execute commands that change state
  • Make assumptions — it should ask questions instead

Tools: Read, Grep, Glob, nothing more.

Rule: Always explore before starting work.

Researcher always runs first.

Agent 2: Story Writer

Most feature failures aren’t because the code is wrong.

It’s because the problem was never clearly defined.

The Story Writer turns rough ideas into a real user story — before any technical decisions are made.

Input:

  • Your rough feature description
  • Researcher’s investigation results

Output:

  • A user story: "As [role], I want [action], so that [result]."
  • Acceptance criteria: Testable statements — happy path, failure paths, business rules.
  • Edge cases: Boundaries, retries, multi-tenancy considerations.
  • Out of scope: What’s explicitly "not going to be done."
  • Unanswered questions: Things it genuinely doesn’t know — no guesses.

What it cannot do:

  • Invent business rules
  • Write any code or technical design
  • Proceed when truly unclear

Tools: Read, nothing more.

Rule: You must read and approve the story before moving on.

This is the critical human review point 1 — ensuring downstream everything is correct.

Agent 3: Spec Writer

Once the story is approved, the Spec Writer turns it into a technical brief.

This brief is the blueprint all build agents follow.

Input:

  • Approved user story
  • Researcher’s investigation results
  • Your project’s CLAUDE.md rules

Output:

  • Data model changes (fields, types, migrations)
  • Background/process flows
  • API changes (endpoints, request/response formats)
  • Frontend changes (components, pages, hooks)
  • Tests needed (success, failure, boundary)
  • Risks and unresolved issues
  • Files to be changed

What it cannot do:

  • Edit files
  • Invent new infrastructure — must specify explicitly
  • Skip tenant isolation or timezone considerations
  • Leave questions unanswered

Tools: Read, Grep, Glob, nothing more.

Rule: This brief is human review point 2.

You read, approve, then files are ready to be touched.

If you see "store ID in memory" — that’s a red flag.

Catch it now. Don’t wait for 10 files to be changed.

Agent 4: Backend Builder

Now it’s time to build.

The Backend Builder implements the "backend half" of features — responsible only for backend.

Input:

  • Approved technical brief
  • Researcher’s investigation results
  • Your project’s CLAUDE.md

It builds:

  • API routes
  • Services and business logic
  • Database access and migrations
  • Background jobs
  • Its own unit tests

It cannot do:

  • Touch React components, pages, or client-side hooks (Agent 5’s job)
  • Invent dependencies without instructions
  • Modify files outside scope
  • Stop without running typecheck, lint, and tests

After completion, it returns a summary: files added or changed, reused helpers or patterns, any CLAUDE.md rules that could be improved.

Tools: Read, Edit, Write, Bash — only within backend folders.

Key point: separation of concerns.

Backend Builder can never accidentally break frontend.

Agent 5: Frontend Builder

Frontend Builder implements the UI part — only responsible for UI.

It first reads the backend agent’s summary.

This is crucial.

It uses the API as per the backend’s output. It does not invent new endpoints.

If the API shape is wrong for the UI, it reports the mismatch — not patching itself.

Input:

  • Approved technical brief
  • Researcher’s investigation results
  • Backend agent’s API summary

It builds:

  • React components and pages
  • Client-side hooks and state
  • Loading and error states
  • Its own component and unit tests

It cannot do:

  • Touch services, API routes, workers, or migrations (Agent 4’s job)
  • Invent endpoints or response formats
  • Add dependencies without instructions
  • Stop without typecheck, lint, and tests

Tools: Read, Edit, Write, Bash — only within frontend folders.

Two builders. Two clean contexts. Zero chance one breaks the other.

Agent 6: Test Verifier

Both builders write unit tests for their parts.

That’s not enough.

Test Verifier does one thing: Prove that this feature actually does what the user story says.

It writes "acceptance tests," not unit tests.

Acceptance tests test externally — like a real user experiencing it.

Input:

  • Approved user story (with all acceptance criteria)
  • Approved technical brief
  • Summaries from both builders

Output:

  • An acceptance test file covering each acceptance criterion
  • A report: which passed, which failed, which can’t be cleanly covered

What it cannot do:

  • Modify backend or frontend code
  • Invent workarounds for untestable criteria
  • Mark untested criteria as covered

If a test fails: the feature does not meet the story.

It reports "which criterion failed." It does not fix code.

Fixes go back to the correct builder.

Tools: Read, Edit, Write (test files only), Bash.

Rule: Until acceptance tests pass, you don’t have this feature.

Agent 7: Implementation Validator

This agent finds what everyone missed.

It compares current implementation against approved story and brief, reporting gaps.

It never fixes anything. It only tells the truth.

Each run checks:

  • Unimplemented acceptance criteria
  • Uncovered failure paths
  • Security issues: missing permissions, tenant leaks, keys in logs, raw errors leaking
  • Files changed outside scope
  • Patterns inconsistent with CLAUDE.md or existing code
  • Reuse of helpers that should be reused but are duplicated
  • Timezone or multi-tenancy considerations quietly skipped in brief

Output is always grouped by severity:

  • Critical — must fix before merge
  • Important — should fix before merge
  • Minor — opinion, reviewer’s discretion

Each finding includes file path and line number.

If no issues: it simply says "No issues." It does not invent problems to seem thorough.

Tools: Read, Grep, Glob, nothing more.

This agent is what makes the entire factory trustworthy.

Self-assessment scores are worthless. An auditor who only looks at "what’s on disk," ignoring "how it’s written," is honest.

How the entire chain runs

Complete process — one prompt starts it all:

You open Claude Code, input:

"Help me implement the 'overdue invoice reminder' feature."

Then you don’t need to type more, and this happens:

Step 1: Researcher scans your invoice, payment, email code. Returns relevant files, patterns, risks.

Step 2: Story Writer produces user story and acceptance criteria.

Pause: You review and approve the story.

Step 3: Spec Writer turns approved story into a technical brief.

Pause: You review and approve the brief. (Here, catch the "store ID in memory" error.)

Step 4: Backend Builder implements service, API routes, BullMQ jobs, unit tests. Returns: file changes, reused patterns, all tests green.

Step 5: Frontend Builder reads backend API summary, creates admin UI blocks and reminder buttons, writes component tests. All green.

Step 6: Test Verifier writes acceptance tests for six criteria. Reports: 7 pass, 1 fail — manual check for tenant ownership.

Step 7: Validator catches it. Reports with Critical severity, file path, line number.

→ Back to Backend Builder. Fixes it. All 8 acceptance tests green. Validator runs again. Clean.

Pause: You review and open PR.

Three human review points. Everything else runs itself.

Basic: Before agents can operate, you need this

CLAUDE.md — Survives in every conversation’s memory

Every time you open Claude Code, it starts from "zero memory."

CLAUDE.md fixes this.

It’s a Markdown file at the repo root, auto-loaded at each conversation start.

It’s the home of "permanent project facts":

  • Your tech stack (Next.js App Router, Node.js, Prisma, BullMQ, Resend)
  • Your commands (npm run dev, npm test, npx prisma migrate dev)
  • Architecture rules ("Business logic in services. API routes thin.")
  • Things not to do ("No cron — use BullMQ. Don’t log raw payment payloads.")
  • Deep documentation pointers (docs/billing.md, docs/architecture.md)

Keep it within 100–300 lines.

Every time AI makes a surprising mistake, ask: "If CLAUDE.md had a rule, could this have been avoided?"

Add the rule.

Weeks later, your CLAUDE.md becomes a record of "all assumptions AI ever got wrong" — your conversations will improve noticeably.

Context Drift — The silent killer

Most Claude Code conversations don’t fail dramatically.

They drift.

A wrong assumption enters the context. The model keeps stacking on top.

You want Claude to do "subscription management." It designs: User → Subscription.

Later, you remember: subscriptions belong to "company," not "user."

If you just say "No, subscriptions belong to company" — Claude patches it.

Now you have both user.subscriptionId and company.subscriptionId floating around.

Rules:

  • Typos? Inline fix.
  • Wrong architecture assumptions? Drop the entire conversation, start over, embed correct assumptions in the first prompt.

A clean conversation with correct mental models always beats a patched one.

Result: what truly changes

Before factory:

  • Vibe coding cycle: prompt → generate → error → patch → repeat
  • Context filled with noise
  • Wrong assumptions turn into broken features
  • One engineer can only do one thing at a time
  • Features wait for the right person’s availability

After factory:

  • Structured chain: Research → Story → Brief → Build → Verify → Confirm
  • Each agent has a clean context, only what it needs
  • Wrong assumptions caught at "brief approval" — not after 10 files
  • One engineer can deliver a complete vertical slice: backend, frontend, tests, verification
  • The best knowledge lives in agents — not stuck on "someone"

True transformation:

A payments expert creates a payments-integration agent. From that moment, every engineer can deliver billing features. No waiting, no handoff.

Frontend lead’s component patterns live in frontend-builder. DevOps CI checks live in hooks. QA’s edge cases live in test-verifier rules.

Expert knowledge shared as agents. Not stuck on "who’s available."

Build your own this weekend

8-step setup checklist:

  1. Install Claude Code → code.claude.com

  2. Create folder structure:

    • .claude/agents/
    • .claude/skills/feature-factory/
    • .claude/skills/build-with-tests/
    • .claude/hooks/
  3. Write your CLAUDE.md (100–300 lines: tech stack, commands, architecture rules, do-not-do list)

  4. Use Claude Code’s /agents command to create 7 agents. Describe each agent’s role. Claude writes files. Review and commit.

  5. Build feature-factory orchestrator skill. Ask Claude to generate it — it will read your 7 agent files and connect the chain.

  6. Build build-with-tests skill. Describe how your team builds: align patterns, write code and tests together, run typecheck at the end.

  7. Add a pre-commit hook. Block commits of .env, .key, .pem, secrets.json. 5 minutes, avoid disasters.

  8. Run a real feature through the full chain. Pick a small one. Observe where it stalls. Add rules. Factory adjusts itself.

Total time: 2–3 hours.

Run several features. After 3–4, the factory knows your codebase.

You’ll spend less time supervising, more time deciding "what’s next."

Seven Agents — Quick Reference

  • Researcher — scans code before anything is built (read-only)
  • Story Writer — turns ideas into user stories and acceptance criteria (read-only)
  • Spec Writer — turns stories into technical briefs (read-only)
  • Backend Builder — builds APIs, services, jobs, unit tests (backend folder only)
  • Frontend Builder — builds components, pages, hooks, UI tests (frontend folder only)
  • Test Verifier — writes acceptance tests for stories (test files only)
  • Validator — compares implementation against story and brief, reports gaps (read-only)

3 human review points:

→ Approve story → Approve brief → Approve PR

Everything else runs itself.


Most Claude Code developers are still in vibe coding. Prompt → generate → patch → pray.

That’s not wrong. But it hits a ceiling.

The factory doesn’t kick you out of the process. It kicks you out of the "parts where your judgment isn’t needed."

You stay in the parts where "your judgment truly matters":

Is this the right problem? Is this the right design? Is it safe to deploy?

All the middle steps are handled by agents.

That’s the difference between "using AI as a faster keyboard" and "using AI as a coordinated team."

Original author: @sairahul1

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned