Will ChatGPT and Claude go to any lengths to eliminate jobs entirely?

a16z Partner Joe Schmidt IV points out that large model laboratories will only lead horizontal tasks, while the real AI application opportunities are hidden in vertical scenarios and complex workflows.
(Background summary: Altman retracts the prophecy of "AI destroying human jobs": I am glad I was wrong, is it sincere?)
(Additional background: Google leads investment in AI routing platform OpenRouter, valued at $1.3 billion, growing 240% annually)

Table of Contents

Toggle

  • Yellow Brick Road
  • Other Places in Oz
  • Why Other Places in Oz Won't Be Occupied by Wizards
  • Sales Case Study — Practical Advice from 11x CEO
    • Focus on Results
    • Powering Through Complex Problems
    • Guardrails are not just to prevent bad things from happening; that’s why customers pay
  • Insurance Case Study — Practical Advice from FurtherAI CEO
  • How to Tell if You Are in Other Places in Oz
  • Both Can (and Will) Win

Entrepreneurs and potential employees keep asking me the same question: Is there still valuable space to build AI applications? Or will OpenAI and Anthropic wipe everything out?

Behind this question lies a kind of “AI anxiety.” Some have already concluded that to avoid remaining permanently at the bottom tier, the only sustainable footholds are either inside large labs or in cutting-edge fields like robotics and hard tech — in theory, anything “labs can’t touch.”

If every piece of software is about to be swallowed, whether directly replaced by Codex or Claude, or rendered unnecessary by future models, then run away fast!

Listen, I am almost an AI supremacist like everyone else, but I think they’re only half right. Labs are indeed eroding a large part of the application landscape. But the “application layer” is not a single, homogeneous opportunity. The right mental model is: Are you on the “Yellow Brick Road” in the land of the labs, or in other parts of Oz’s fairy world?

Yellow Brick Road is our shorthand for the path labs are taking, where they pour incredible resources. Labs are best suited to solve problems like code generation, writing, or image creation because these issues improve as “model raw capabilities” advance: every dollar spent on pretraining and fine-tuning directly boosts product quality.

Meanwhile, other places in Oz are filled with more complex, often vertical, problems. These are not just “general tools” with standard functionalities and computer access for enterprise users.

Their value doesn’t come primarily from the raw capabilities of the base models (though that’s still important!), but from the scaffolding surrounding them — the frameworks that make outputs trustworthy, compliant, and ready for real-world deployment in specific industries.

We are witnessing this scenario unfold in real time: OpenAI and Anthropic are signaling to the market that they cannot solve all problems with a single general AI colleague. They have announced large-scale front-line deployment joint ventures, building entire companies around configuring and customizing models for enterprises. If you think the next model release will solve everything, you definitely won’t be investing billions into these projects.

So, if you want to get rich by developing AI applications — skip the Yellow Brick Road and explore other parts of Oz. Here’s what we and some entrepreneurs in our portfolio have learned about what actually works.

Yellow Brick Road

If you’re starting a business, the Yellow Brick Road is the most obvious route, but also the most dangerous. Take a high-performance model, plug in some off-the-shelf connectors (like Google Drive, Slack, Salesforce, Notion, GitHub), and deploy some kind of Agent orchestration layer. It’s almost magical!

The problem is, this is exactly what labs are doing with Cowork and Codex. Clearly, they own the models themselves, giving them better margins, control, and pricing power over downstream vendors.

But perhaps most importantly, they also control the “architectural choices” that determine which problems their products can perfectly solve. So far, they’ve thought deeply about “model plus tool calls,” which is exactly what’s needed for the horizontal, low-step work on the Yellow Brick Road. Even if startups can surpass Codex or Claude Code to some extent, labs have vast distribution channels and the strongest brand halo in AI.

If you’re an AI application company copying this approach, using the same connectors, with no sub-agents or deep configurations, and no distribution channels, you’re likely heading down a dead-end.

Other Places in Oz

For startups, this isn’t all doom and gloom. Outside the Yellow Brick Road, there are huge opportunities. Startups there have a clear path to own their customers and solve complex problems.

These companies are building Agent experiences, weaving models into complex networks of tools, automation, and integrations (in other words: software). Most such startups are inherently vertical.

They can focus on multi-step, multi-role workflows, and set up sub-agents tailored to specific roles and vertical tasks — something Anthropic and OpenAI’s general platforms can’t reach: collecting context across systems and routing it to multiple people who need to approve at different stages.

This often involves legacy systems, which tend to require deterministic results (no room for ambiguity), and are often directly tied to high-value business outcomes.

Labs understand how valuable these problems are: that’s why they’re building their own outsourcing teams and why there’s a whole class of high-end reinforcement learning businesses.

Why Other Places in Oz Won’t Be Occupied by Wizards

Some argue that so far, shorting models/labs has been a poor bet. They’re likely to keep getting stronger and eventually swallow the markets served by these application companies.

Labs will continue to improve, but I believe companies in other parts of Oz can protect themselves over time in several ways:

Data and Learning Flywheel: Most of what you internalize isn’t in any training set — unwritten industry norms, undocumented standards, and collective intelligence stored in practitioners’ minds. These can’t be found on public websites. No amount of training compute can replace being embedded in these “workflows” where this knowledge actually exists.

There are two stacked flywheels here: one is across customers (seeing more variants of the same problem creates compounding effects); the other is within customer organizations (the reasons behind decisions, unspoken exceptions, and the company’s own heuristics, which only surface through real system interactions).

A company that runs its Agent through 100 legal redlines, 1,000 insurance underwriting cycles, or 10,000 SDR marketing activities has internalized the problem’s essence — something a new entrant can’t replicate on first launch. Evaluation sets, labeled outputs, and edge case classification can form a vertical-specific data flywheel, powering fine-tuning.

Model Variability and Complexity Management: Labs are already routing requests to different model tiers and doing ensemble integration at the backend. But they can’t do “cross-vendor routing,” or evaluate competitors’ models for specific sub-tasks, or use open-source fine-tuned models in the most suitable segments. Companies in other parts of Oz will pick the best model for each sub-task across the entire market.

Whenever new models are released, they also carry the burden of unglamorous chores — rerunning evaluations during upgrades, recalibrating prompts for edge cases, deploying without disrupting production. Labs won’t do these for clients; they just sell you the next model and tell you to migrate yourself. Companies elsewhere absorb these migration costs.

Cost Optimization: Running every query on the latest frontier giant model is the fastest way to go negative gross profit. Top application companies route between models at different tiers — using frontier models for the hardest tasks, mid-tier models for most routine work, and smaller fine-tuned or custom models for specific segments.

Labs price at the bottom line: offering minimal usable intelligence for X dollars. Companies in other parts of Oz do the opposite — delivering the specific intelligence level needed for the workflow at the lowest dollar cost. This is only possible if you know exactly what each sub-task requires.

Governance: Becoming the control plane for clients running AI in a vertical domain has enormous value — it’s where permissions, audits, what agents are allowed to do, and what they actually do intersect.

Because they own the tools, workflows, and data end-to-end, they can deliver deterministic results. They also serve as the entity absorbing regulatory complexity for the end user — rules in legal, HIPAA in healthcare, SEC and FINRA in finance, insurance regulations across states, etc. CIOs want partners who can explicitly state “they are handling compliance for the agents we provide” in contracts.

All of this boils down to one thing: Focus. It can be a vertical domain (insurance, legal, accounting) or a deep functional area (sales, customer support, finance). Labs are not born for this. They must be everywhere, serving everyone, which is why they’re building out their Yellow Brick Road. The same trade-offs keep them from being in other parts of Oz — you can’t be everywhere and excel at something simultaneously. You must choose.

Sales Case Study — Practical Advice from 11x CEO

How should you think about this in practice? Here are some practical tips from 11x CEO Prabhav Jain:

Focus on Results

The tactical path to building a lab-immune company is to start from the “specific results” that truly matter to customers. For us, that’s helping companies build more sales pipelines.

What end-to-end activities actually drive pipelines? Break down each activity into tasks. Which tasks need Agent automation, which don’t? When workflows involve many steps, messy inputs, hard-to-interpret states, or real-world constraints, a better model alone won’t get you there. This work falls to traditional software engineering.

For example, our tasks include: lead development based on custom signals, data enrichment, deep account research, CRM context gathering, message drafting for specific channels, lead qualification Agent, and email deliverability systems. None of these are “one-shot” tasks; they require deep engineering. About half of real workflows are non-Agent, and that part has no lab advantage.

Powering Through Complex Problems

Complex problems are where real business value unlocks. Otherwise, you’re just building a thin wrapper.

Here’s an example from GTM (Go-To-Market): if a company is already your customer, you shouldn’t contact one of their contacts directly.

But it’s rarely that simple. What if the company has dozens of subsidiaries? What if CRM only records the parent company’s domain? What if an expired matching field sends a cold pitch to a revenue officer at an existing client? Making sense of this chaos requires a dedicated, problem-specific Agent, not a general copilot.

Guardrails Are Not Just to Prevent Bad Things; That’s Why Customers Pay

Guardrails are severely underestimated. The protections required by a regulated financial prospect are very different from those of a mid-market SaaS customer. These protections extend to what agents are allowed to write, who they can contact, what data they can access, and how decisions are logged.

Faced with these differences, one-size-fits-all systems will inevitably fail. Guardrails must be built according to use case, configured per customer, and continuously audited. That’s why we have Frontline Deployment Engineers (FDEs) and Technical Deployment Strategists, who tailor solutions for each client.

Insurance Case Study — Practical Advice from FurtherAI CEO

Sales is one example. Insurance is another, illustrating the same point from a different angle. Here’s what FurtherAI CEO Aman Gour says:

“When we started deploying AI in real insurance operations, we kept hearing a particular assumption: models are where intelligence resides, and workflows are just scaffolding around them.

The more insurance companies we work with, the more we realize this is backwards. In insurance, much of the intelligence actually resides in the workflows themselves.

Two insurance companies can process an application through what looks like the same path: submission, review, quoting, underwriting. But the path is the simplest part. What differentiates these companies are all the details: which risks need reporting, which loss signals are critical, how to resolve conflicts when two appetite rules clash, when manual sign-off is needed, and how decisions are recorded. These logics aren’t in a clean rules engine. They’re embedded in SOPs, managerial reviews, underwriting philosophies, and years of operational experience.

That’s why we’re building agentic workflows. Workflows with repeatability, auditability, and cost control; agents handle variability and repair when routine paths fail; humans stay in the loop for accountability and judgment.

Over time, workflows stop being just scripts and start becoming the operating memory of the insurance company. That’s the part labs find hard to reach. This understanding only comes from running the workflow thousands of times in production. The workflow you launch on day one isn’t a moat; over time, the closed loop created by production use is the moat.”

How to Tell if You Are in Other Places in Oz?

  • Tools-and-Steps Test: How many steps does this work require? How complex are the tools needed? Compare: horizontal AI search across Google Drive (single tool, single step, high fault tolerance) versus multi-step legal clause revision spanning law firm precedents (dozens of steps across multiple tools, results require partner review). Both look like “Agents at work,” but only one involves deep software that takes years to develop.

  • System Test: Are you building a “system” that runs the client’s work end-to-end — data capture, governance, record of completed work? Or just a “tool” that adds intelligence to an existing workflow? A system owns the entire workflow, while a tool enhances an already running process. High ACV (annual contract value) often signals a system, as it replaces actual human staffing. Ask yourself: if a lab launched a product claiming to compete directly with you, would clients still need your tool? If yes, you’re building a system.

  • Hedge Fund / P&L Test: Labs are judged by benchmarks; Oz companies are judged by client P&L. Clients don’t care how well your model scores on benchmarks — they care whether your Agent facilitates trades, correctly revises contracts. The best Agent companies operate like hedge funds: winning on “excess returns” (Alpha) measured by client P&L.

Both Can (and Will) Win

We will see huge winners both on the Yellow Brick Road and outside it. Model companies will keep winning because they own the models and the distribution channels for their general tools.

Companies in other parts of Oz, if they can develop work systems — the actual operational interfaces and data capture — will succeed. As complex workflows in verticals mature, they will form the core experience that clients rely on. The base models are replaceable; work systems are not.

The next generation of enterprise software will be born outside the Yellow Brick Road.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned