Is OpenAI aggressively consuming the application layer? a16z: Beyond the “Yellow Brick Road,” opportunities for founders are still to come

A16z Partners Point Out That AI Application Layer Is Not a Single Battlefield; Startups Should Avoid Direct Attacks on Large Model Companies' Horizontal Tools and Focus on Vertical Industry Deepening. This article originates from a Twitter post.
(Background: Google leads investment in AI routing platform OpenRouter, valued at $1.3 billion, with 240% growth in one year)
(Additional context: Sam Altman discusses with a16z founder: OpenAI will aggressively bet on infrastructure, Sora is an important strategic tool)

Table of Contents

Toggle

  • Anxiety Spreading: Are Large Models Swallowing the Application Layer?
  • Brick Road Trap: The Path of Horizontal Tools Leading to Oblivion
  • Oz Country Opportunities: Vertical Workflows as Moats
  • Cost Advantages: Model Routing and Post-Training
  • Control Plane: Compliance and Governance Value

This is precisely the question that a16z partner Joe Schmidt attempts to answer in this article. He uses the "Yellow Brick Road" from The Wizard of Oz as a metaphor, dividing AI application opportunities into two categories: one is the main route that large model companies are directly entering, such as code generation, writing, image creation, general agents, and horizontal office assistants; the other is "somewhere in Oz," meaning those deep into industry processes, relying on complex workflows, data accumulation, compliance, governance, and system integration in vertical scenarios.

In his view, the real opportunity for startups lies in the latter.

From sales to insurance, Joe Schmidt repeatedly emphasizes the same logic: what enterprises are truly willing to pay for is not a smarter chat window, but a system responsible for business outcomes. It needs to understand the chaos of customer data, handle multi-party approvals and edge cases, bear compliance and audit responsibilities, and also, as models are upgraded, help clients migrate, route, and optimize costs.

This is also the core judgment of this article regarding next-generation enterprise software: underlying models will become increasingly powerful and replaceable; but what is truly irreplaceable are the data, processes, governance capabilities, and operational memory accumulated around specific industries and workflows. The opportunity for AI application companies is not to compete with model companies over the "Yellow Brick Road," but to go into those more complex, messier, slower places that are closer to real business value.

Recently, I keep hearing the same question from founders and potential employees: Is there anything left to do at the application layer of AI? Or will OpenAI and Anthropic ultimately kill everything?

Behind this question lies a typical AI-style anxiety. Some have already concluded: if you don’t want to be a permanent underlying layer, the only long-term valuable position is either inside large model labs or in startups in robotics, hard tech, or similar frontier fields—essentially, doing those things "labs can't touch." Because if every software category will be swallowed—either directly absorbed by Codex or Claude, or rendered unnecessary by future models—the best choice seems to be: run fast!

I admit, I am almost an AI maximalist myself, and I think they are half right. Large model labs are indeed entering large swaths of the application layer. But "application layer" is not a homogeneous opportunity set. The key criterion is: are you walking the "Yellow Brick Road," or are you in "somewhere in Oz"?

Note: "Yellow Brick Road" refers to the main route in The Wizard of Oz leading to the Emerald City and the Wizard.

The so-called "Yellow Brick Road" describes the path that large model labs are walking and investing heavily in. Code generation, writing, image creation—these problems are naturally suited for labs because they improve as models' raw capabilities improve: every dollar spent on pretraining and fine-tuning directly enhances product quality.

But in other parts of Oz, there are more complex, often more vertical problems. They are not simply providing a horizontal tool for enterprise users to connect to standard tools and operate with basic computer skills. The value here comes more from scaffolding around models: these scaffolds make outputs credible, compliant, and truly integrated into business workflows. The raw capabilities of the base models are still critical, but no longer the whole story.

We are seeing this in real time. OpenAI and Anthropic are actually admitting to the market: they cannot solve all problems with a universal AI colleague. They have announced large-scale front-line deployment joint ventures around configuring and customizing models for enterprises. If they truly believed that the next model release would solve these issues, they wouldn’t be investing billions in such projects.

Anxiety Spreading: Are Large Models Swallowing the Application Layer?

So, if you want to make money from AI applications, don’t walk the "Yellow Brick Road," but build in other parts of Oz. Here are some lessons we and some founders in our portfolio have learned in practice.

If you are founding a company, the "Yellow Brick Road" is the most obvious route, but also the most dangerous. Take a high-performance model, connect some ready-made connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and then layer an intelligent orchestration layer on top. It looks like magic.

The problem is, this is exactly what large model labs are doing with Cowork and Codex. Clearly, they have models, which means they have better profit margins, stronger control, and pricing power over all downstream participants. But perhaps more importantly, they also control the architecture choices that determine what problems the product is suited to solve. So far, they have been very deliberately adopting the "model + tool call" pattern, which is precisely the pattern needed on the horizontal, low-step-count work on the "Yellow Brick Road." Even if a startup can somehow surpass Codex or Claude Code, the labs still have huge distribution capacity and the strongest brand halo in AI.

If you are an AI application company adopting the same approach—integrating the same connectors, with no underlying sub-intelligence or configuration, and no distribution channels—you are very likely heading down a path to nowhere.

For startups, the situation isn’t all bleak. Outside the "Yellow Brick Road," there are still enormous opportunities. Startups can have clients and solve complex problems in these areas.

These companies are building intelligent experiences: models woven into complex tools, automation, and integration networks—basically, software. This also makes most of these startups inherently verticalized. They can focus on multi-step, multi-party workflows, designing sub-intelligences for different roles and vertical scenarios, tackling problems that horizontal platforms like Anthropic and OpenAI find hard to reach: collecting context across systems, routing tasks to multiple approvers at different stages.

Such work often involves legacy systems, requires deterministic results, because ambiguity is unacceptable, and sometimes directly ties to key business outcomes. Large model labs of course recognize the value of these problems: that’s why they are building their own outsourcing configuration teams, and why a whole group of enterprise-focused reinforcement learning service companies is emerging.

A counterargument to the above is: so far, betting on models or labs has been a very bad deal because they won’t keep improving. They are very likely to keep getting stronger and eventually eat the markets served by these application companies.

Large model labs will indeed continue to improve. But I believe that companies elsewhere in Oz can defend themselves in several ways over the long term.

Much of what you truly internalize in your business does not exist in any training corpus: unwritten industry conventions, undocumented standards, tribal knowledge stored in practitioners’ minds. They are not on the public internet. No matter how much training compute you invest, you cannot replace the work of truly embedding this knowledge into workflows.

This creates two flywheels: one is cross-client, where seeing more variants of similar problems causes models to improve cumulatively; the other is within the client, where the reasons behind specific decisions, unspoken exceptions, and the company’s own heuristics only emerge through real user-system interactions.

Brick Road Trap: The Path of Horizontal Tools Leading to Oblivion

Even if client data cannot be shared across clients, application companies can still leverage pattern recognition of different problem types to guide future problem structuring. If a company’s intelligent system has handled hundreds of legal redline revisions, thousands of insurance underwriting cycles, or tens of thousands of SDR outreach activities, its understanding of problem patterns is not something a newcomer can replicate with a single new intelligent module.

In theory, a horizontal intelligent system could build the same learning infrastructure. But it doesn’t, for two main reasons: lack of focus, and more critically, user experience. Capturing this knowledge depends entirely on the interface you provide to users. Vertical players can design interfaces around the specific information that truly needs to be exposed in particular workflows—horizontal tools cannot. Evaluation sets, annotation outputs, boundary case classification systems—these can form a data flywheel for a vertical domain and further support fine-tuning. Without exposure in a comparable production environment, latecomers will find it hard to generate such a flywheel. Its feasibility depends on data rights, accumulated production usage, and customer contract structures, but pattern recognition itself will continue to improve.

Large model labs are already doing routing internally: calling different classes of models for different requests, using model integration at the base layer. But they cannot do cross-vendor routing, nor easily evaluate competitors’ models for specific sub-tasks, or use the most suitable open-source fine-tuned models in narrow segments.

Companies elsewhere in Oz will choose the most suitable models for each sub-task from the entire model marketplace, not just rely on a single lab’s model. They will also undertake those jobs no one else wants: re-running evaluations after each new model release, recalibrating prompts for boundary cases, deploying without disrupting production. Large model labs won’t do these for clients—they sell you the new models and tell you to migrate. Companies in Oz absorb the migration costs. Clients get the best intelligence capabilities on the market, with continuity across upgrades.

Feeding every query into Opus 4.7 is the fastest way to turn gross margin negative. The best companies in Oz route between models at different levels: the most difficult tasks go to frontier models, most tasks to mid-tier models, and smaller, fine-tuned models or micro-tuned models are used where proven feasible.

Some of these companies are now doing their own post-training, optimizing models for the small subset of work that truly matters to clients, and offering services at a fraction of the cost of frontier API calls. Large model labs price at "floor prices": the minimum intelligence level you can buy for X dollars. Oz companies sell the opposite: achieving the lowest dollar cost at the level of intelligence truly needed for specific workflows. This is only possible when you understand exactly what level of intelligence each sub-task requires. Large labs structurally cannot understand every task in every vertical industry. Ultimately, this translates into lower, more controllable outcome pricing.

Becoming the control plane for clients executing AI in a vertical domain offers significant value. This control plane is where permissions, audits, what the intelligence can do, and what it actually does are consolidated.

This control plane is built on specific use case guardrails, which vary greatly across industries and roles. Because these companies have end-to-end tools, workflows, and data that the intelligence interacts with, they can deliver deterministic results in ways horizontal tools cannot. They also absorb regulatory complexity for the end buyer: US Federal Civil Procedure Rules and attorney licensing rules in law, HIPAA in healthcare, SEC and FINRA regulations in finance, state insurance regulations, and more. Horizontal players cannot convincingly do this unless they become a hundred different vertical industries. CIOs need a partner that can explicitly commit in contracts to assume compliance responsibilities for the provided intelligence.

All of this ultimately boils down to one thing: focus.

This focus can be on a vertical industry, like insurance, legal, accounting; or on a deeply developed function, like sales, customer service, finance. Either way, it requires a team long-term embedded in the same customer base, understanding workflows, edge cases, and regulatory requirements. Large model labs are not built for this. They must serve everyone, cover everything, which is why they initially built the "Yellow Brick Road." The same trade-offs make it hard for them to enter Oz’s other places: you can be everywhere but not excel at one thing, or excel at one thing but not everywhere.

In practice, how should this be understood? Here are some practical suggestions from 11x CEO Prabhav Jain.

Oz Opportunities: Vertical Workflow Moats

Building a company resilient to large model lab disruptions can start from focusing on the specific results that matter most to clients. For us, that result is helping enterprises generate more sales leads and pipelines.

From here, the questions become very concrete: which activities do we want to own end-to-end, and truly drive pipeline growth? Break each activity into tasks. Which tasks are suitable for intelligence, which are not? Which require deep domain insight, which do not? Large model labs will also release workflows, but when a workflow step is many, inputs are chaotic, states are hard to interpret, or real-world constraints exist, simply having a better model isn’t enough. The work reverts to traditional software engineering, where large model labs have no advantage over a focused application company.

For example, tasks we handle include: prospecting based on custom signals, enriching prospect info, deep account research, pulling context from CRM, writing content for different channels, prospect qualification intelligence, and email delivery systems. Some are intelligence tasks, some are not. These tasks cannot be done with a single prompt; they require deep engineering.

The key insight in the Oz analogy is: in any real workflow, roughly half are non-intelligence tasks, and this half does not confer an advantage to labs. Their ability to write deterministic software is no better than yours. The other half—intelligence tasks—still require you to focus on the actual desired outcomes, tuning, training, and constraining models accordingly.

Domain knowledge often isn’t in general training data. These capabilities must be built from vertical industries or specific functions bottom-up, fed into models at the right moments in workflows. When our intelligent system judges whether an inbound lead is qualified over the phone, it must be trained to understand: what constitutes a good sales conversation for a specific industry, specific user profile. This is work for application companies, and this capability compounds over time.

More critically, these capabilities become outdated as companies evolve. Therefore, continuously evolving workflows and contextual understanding become competitive advantages. For example, when we first launched scaled email outreach, "AI-written emails" were just emerging. Today, people have developed a keen sense of distinguishing AI-written from human, and this judgment changes every few months. Our intelligent systems must adapt to market dynamics, and the moat is built here. Despite this dynamic, our response rate has increased fourfold in recent months, generating hundreds of millions of dollars in sales pipelines.

Complex problems are where real business value is unlocked. Otherwise, you risk just creating a thin veneer.

Decomposing any sufficiently complex business problem quickly reveals chaos. Here’s a simple-sounding example from GTM: if a company is already your client, you shouldn’t contact a particular contact within that company. But this is far from simple.

Maybe your CRM has the domain name for that company. What about companies with dozens of subsidiaries? What if the CRM records the parent company’s domain? What if an outdated matching field in Salesforce causes you to send a cold outreach email to the CEO of an existing client? Real-world data is chaotic. Humans struggle with it, and models won’t magically cross that threshold. To bring order out of chaos, you need to design specialized intelligent modules around the specific problem forms—not just point an all-purpose assistant at the CRM. In fact, based on our data, we find our data quality and freshness surpass that of our clients, so we default to our own data as an anchor.

Guardrails are severely underestimated. Even within the same product, each use case needs its own guardrails. For us, a regulated financial prospecting system is very different from the guarantees required by a mid-sized SaaS client. These guarantees cascade into how the intelligence writes, who it can contact, what data it can access, what it can say on calls, and how each decision is recorded.

A "one-size-fits-all" system will collapse in the face of these differences. Guardrails must be built per use case, configured per client, and continuously audited—work that entirely falls on the application company. That’s why we need frontline deployment engineers and technical deployment strategists to tune for each client’s requirements.

Cost Advantages: Model Routing and Post-Training

For example, we once worked with a Fortune 1000 firm to conduct consented outbound calls to its large SMB customer base via voice. Initial attempts had low pickup rates. We had to iterate quickly, learning how to get this specific audience to engage within the first 10 seconds of a call. SMB owners behave very differently from large B2B buyers or consumers. Now, the sales opportunities we generate in a day surpass what their entire sales team can produce in a month in that segment.

Sales is just one example. Insurance is another, illustrating the same point from a different angle. Here’s how FurtherAI CEO Aman Gour understands "leaving the Yellow Brick Road."

When we started deploying AI into real insurance operations, we repeatedly heard the assumption: models are intelligence, workflows are scaffolding around models.

But the more insurance companies we work with, the more convinced we are that the opposite is true.

In insurance, much intelligence already exists within workflows. Two insurers can route a submission through what looks like the same process: submission, review, quoting, underwriting. The process itself is straightforward. What truly differentiates insurers are all the internal decisions: which risks need escalation, which loss signals are critical, which underwriting preferences take precedence when conflicts occur, when human signatures are required, what external data to pull, and how final decisions are recorded.

These logics are not stored in a clean rules engine. They are dispersed across standard operating procedures, managerial reviews, underwriting philosophies, insurer-specific risk appetites, and years of operational experience. Many are not written down in a form that models can directly read.

That’s why we don’t believe in pure "reasoning from scratch" intelligent agents that start from zero each time, nor in rigid workflows that collapse under real-world complexity. Instead, we build intelligent workflow systems. Workflows bring repeatability, auditability, and cost control; intelligent modules handle variability and recover when the ideal path is disrupted; humans stay involved in judgment and accountability points.

Initially, this system automates manual work. Over time, each upgrade becomes a signal, each exception a feedback, each human correction a clue about where the manual process is incomplete. Eventually, workflows are no longer just scripts—they become the operational memory of the insurer.

This is the part that large model labs find hard to reach. They will continue releasing better models and more general intelligence, and they should. But they won’t stay long in an insurer’s production workflow to learn why an account was escalated, why a risk was rejected, or why an underwriter overruled a risk appetite guideline—and proved right.

This understanding can only come from executing the same workflow thousands of times in production. The initial delivery of a workflow is not a moat. The moat is the loop of real-world use over time.

For us, this is what "leaving the Yellow Brick Road" means.

Control Plane: Compliance and Governance Value

How many steps does this work require? How complex are the tools you need to build?

Compare a horizontal AI that searches in Google Drive: it’s a one-step operation targeting a single tool, with high fault tolerance. After reading a summary, if wrong, just ask again.

Now consider a multi-step legal redline revision task based on law firm precedents from the past three years: it might involve dozens of steps, multiple tools, outputs requiring partner review, and even courtroom argumentation. Both seem like "one intelligent agent doing the work," but only the latter requires deep software built over years by focused teams.

Are you building a system for clients to execute work, or just adding a tool on top of their existing system?

A system has end-to-end workflows: data capture, governance, work completion records. When clients describe how work actually happens, they point to this system. Tools are just adding intelligence layers to existing workflows.

Tool-based products can generate real revenue, but large model labs can more easily take them away because clients don’t rely on you as the orchestration layer. High ACV usually signals a system product, because it replaces real human labor and commands corresponding payment. But it’s not an absolute guarantee. Ask yourself: if a large model lab releases a product that seems to directly compete with you, do clients still need your tool? If yes, you are building a system. If no, you are just a tool—even if your ACV is high.

The performance of large model labs is judged by benchmarks; the performance of other Oz companies is judged by clients’ P&L statements.

Clients don’t care how well your model scores on SWE-Bench or MMLU. They care whether your intelligence has closed deals, correctly modified contract redlines, or underwritten the right policies. If clients focus on specific workflow outcomes rather than general capability scores, you are in Oz. If they buy based on general capabilities, then what you sell is what they can get from Claude or Codex seats.

The best intelligence companies need to operate like hedge funds: they win on alpha, and alpha is measured in client profit and loss, not in benchmark scores.

We will see huge winners both on and off the "Yellow Brick Road." Models will continue to win because they have models and distribution capabilities designed for horizontal tools.

Others in Oz can win if they have the work system: the actual interface for enterprise work execution, and the data flowing and being captured from it. These companies have data capture, workflow action systems, and governance. As complex workflows in a vertical domain mature, they form a core experience clients can’t live without. As existing players and new entrants release new generations of models, these companies will be the layer that integrates and delivers these models to clients. The base models are replaceable, but the work system is not.

The next generation of enterprise software will be built outside the "Yellow Brick Road."

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned