Andrew Ambrosino is the head of the OpenAI Codex team. He started as a designer, worked as an engineer, built products, started companies, and now leads Codex, which has over 5 million weekly active users. He's probably one of the best people to answer the question, "How should products be built in the age of AI?"

In his view, when almost everyone in a company can quickly prototype a feature, the real challenge is no longer "can we build it," but "should we build it."

In a conversation with Lenny, Andrew Ambrosino detailed OpenAI's internal development process: when implementation costs are drastically compressed by AI, every stage of product development—from ideation, documentation, prototyping, design, role division, team collaboration, to product planning—is changing. Which old rules are becoming obsolete? What new standards are emerging? When implementation is no longer scarce, what truly becomes the scarce resource?

Some core insights:

When 90 people can each produce 90 seemingly shippable prototypes, the most valuable thing is taste.

One of Codex team's hard hiring criteria is taste—the ability to distinguish signal from noise in a sea of content. In a world of "infinite tokens," they don't want to produce garbage.

If Codex had launched three months earlier, it would have failed completely. The only variable was model improvement. Don't be quick to judge a feature as bad—it might just not be the right time yet.

Whether a feature is ultimately good enough often depends not on the feature's form itself, but on how smart the model is.

Just as Codex once disrupted ChatGPT, Codex itself could be disrupted by new experiments. Preserve a bottom-up culture of exploration. You can't expect the same team to both polish details and disrupt itself.

Here are the highlights of the conversation.

Implementation costs are down, so taste becomes more important.

Lenny: You've said AI is changing how product work happens. You're now probably on the most cutting-edge AI product team in the world. Specifically, how has the way product teams work changed compared to two years ago?

Andrew Ambrosino: Now as a team lead, the hardest thing is that the process has been flipped.

Everyone knows the old way of building products: research, come up with ideas, make prototypes. Even after we moved past the waterfall era, the underlying logic was the same—implementation was expensive. So you had to eliminate risks upfront through documentation, research, and prototypes, before building. Because prototyping and design were cheaper than development—that was the old assumption.

Now that assumption is completely different. Anyone can make anything. I genuinely believe that if you start from scratch and talk to these models—whether ours or other companies'—you can build any feature you want. This isn't necessarily the hardest part of software development, but it's pretty cool.

Give people unlimited tokens, and everyone at OpenAI is proactive and has good ideas. So everyone builds all sorts of things. There's a feature we urgently need right now in the company, and I'm sure there are 90 different, uncoordinated small teams each building and experimenting with their own version. Out of those 90 attempts, which ones are good? Which parts should be folded into something else? How should it be defined? Should it be part of another feature? How many options should the toggle have? It's these kinds of things.

So the short answer is: it's flipped. It's not that people are doing fundamentally different roles, or that skills have disappeared and roles don't exist. Implementation is no longer the most expensive part—I dare say the most expensive thing is taste.

Lenny: So before, people would write PRDs and strategy docs. Now people go straight to prototypes. Many people in the company have similar ideas, so you get 90 different things, and then you pick a direction from among them?

Andrew Ambrosino: Yes, that happens a lot. Not just at OpenAI. You've already seen many product leaders say "PRD is dead, prototyping is king," but I actually completely disagree with that view.

Because implementation has become cheap across every medium, it's very tempting to skip thinking and go straight to prototyping. Especially if you're not an engineer, if you've never written code, or aren't interested, or don't have time, you can't help but say, "PRD is dead, let me just show you what I want."

But I've also noticed the opposite phenomenon. For engineers, it's become very tempting to write tons of documentation—tons of unreadable documentation. This isn't to say the people writing docs are bad, it's that when implementation becomes abundant, choosing the right medium to express your point becomes truly important.

If you're expressing product clarity in a fuzzy domain, then maybe you should write a doc. If you want people to try it out and stress-test an interaction pattern, then build a prototype. The key is choosing the right medium.

Lenny: There's a concept called "primal mark"—the first brushstroke on a canvas, from which everything else extends. Do you mean that sometimes a prototype is the wrong first brushstroke? Because people anchor on it, rather than thinking about the bigger picture?

Andrew Ambrosino: Right. In the past, there was an implicit signal: what something looks like indicated what stage of the process it was in. If you saw something that looked like a shipped product, that meant it was late stage—risks had been eliminated, design had been reviewed, business goals were justified.

Now those things are decoupled. The reason is that in the past, to get resources to build, you had to sufficiently reduce risk. Now that threshold is gone. So something that's just exploratory might look shippable—it's visually ready. But it might not be the right direction, it might not align with research, it might not be what users actually need, and it might not be the best choice for the business.

I don't want to overemphasize taste. But again, knowing what to do, how to present it, how to achieve the goal, what medium to use—that is becoming the most important skill in every field.

Lenny: The word "taste" is a buzzword right now. Specifically, what do you mean by "good taste"?

Andrew Ambrosino: Taste has to be broken down.

There's an aesthetic component, but there's also a systems-thinking component—how does this fit into the whole system? There's a directional component—where are we going, what theme is this part of? There's an expression component—how to present this information. And there's an interaction component—this animation doesn't match the meaning it's trying to convey; it's too abrupt, it doesn't align with the message.

These are all very important. But the real taste question is: if we can do anything, what is the goal? How do we get there?

Lenny: As AI becomes more powerful and does more things, where will the human brain continue to add value? Taste feels like one of those things. But AI design output still isn't great. Why don't the top models do well in design?

Andrew Ambrosino: There are some practical reasons, and some harder problems. Design is harder to score than software. Creating a feedback loop to train a model on what good design is, is much more tedious than training a model on whether code compiles, because human taste is part of the feedback mechanism.

Also, labs have historically prioritized making models good at things that accelerate AI research. A model that can write correct code obviously accelerates research. Design can't make the same argument. Not that design isn't important, it's just not in that flywheel.

Those are practical reasons, and they will go away. Models will become pretty good at design. But there are harder things.

First, design has cultural attributes. Remember last year when every new website copied Linear's design. If the model outputs Linear's website every time, that's not the challenge. Novelty in design is far more important than in software engineering. In software engineering, you'd prefer the model to follow known patterns exactly. But design requires a certain randomness and novelty.

Second, there's the interplay between visual design and code. If the company rebrands tomorrow, the shallow approach is to update 263 components one by one. The deeper approach is to understand that two things that look different both belong to a list style and convey the same interaction pattern. That abstraction layer is something current technology can't reach yet.

Lenny: Jenny Wen (Design Lead at Anthropic Claude Code) says the design process is dead—just build directly. What do you think?

Andrew Ambrosino: Jenny and I probably agree on many things. The formal design process—I agree with her, it's dead. And I wasn't a fan of that process even before AI.

Years ago when I was running my startup, there was an article called "The Case Study Factory." It was about how designers were trained to follow a fixed process, and gradually they valued the process itself more than the outcome. If something went through the process, two conclusions were assumed: first, it would be good—the process guaranteed quality; second, even if no one used it, it was good because it went through the process.

User research, diverge, converge—the framework is right, but it was always a bit academic. The premise of that process was that implementation is expensive; you can only build once, so you must exhaust the problem space and solution space before building.

Then Figma and Origami came along, and we pulled interactive prototypes into the process. Now the problem is, you can pull all of implementation to the very front of the process. A fully polished prototype that looks like it could ship immediately. Enough people in the company see it and ask, "Can we ship this now?" But in reality, you're still in early design exploration, just no one explicitly says so.

So saying the design process is dead is both true and false. If you're tied to specific tools and specific daily rituals, then yes, it's dead. But the awareness of "what stage of the process are we in" is more important than ever.

The dangerous thing is tying the design process to a specific medium. Designers now have more tools. You can drop things directly into the existing product; you can run A/B tests. Many companies have baby versions of the product—baby Cursor, baby Codex—a heavily simplified codebase that simulates all the interactions of the real product. You can use it to vibe code: "What if the sidebar looked like this? What if a panel popped out?" These are new tools for designers, but they need to be paired with the old awareness: where are you in the process now?

Roles are blurring, but PMs won't disappear.

Lenny: Many companies are talking about "role death." Do you think the division between PMs, engineers, and designers will completely disappear?

Andrew Ambrosino: Some companies like to follow trends to extremes. The danger of eliminating the concept of roles is that it might simultaneously eliminate the awareness that these fields have best practices that can be learned as a craft.

I hear many companies say "we're eliminating the product role," and I think that's a terrible idea. Then they say "everyone is a builder." The result is that product management—a discipline that has accumulated real best practices and real lessons learned from mistakes—gets thrown away. Because someone wrote a few lines of code, they think everything is fine. That's not a good state.

I welcome the breakdown of boundaries like "this is your area, you can't touch it," but it needs balance. Not everyone can do everything, in terms of breadth or depth. That's also why managers won't disappear.

And every discipline has skill components. Many engineers don't admit this—they think engineering has skills, but other roles are just "vibes." It's not like that. Knowing Excel doesn't mean you can work in finance.

I think what's happening more is that it's become easier for people to collaborate across roles, and easier to learn best practices from other fields, without tying your effectiveness in a role to your proficiency with a specific tool.

For a long time I thought I shouldn't be a software engineer because I didn't care about assembly language and didn't want to memorize TypeScript's type system. Those roles always had some gatekeeping, as if "being good at this role equals being proficient with this tool." I think that's starting to dissolve, but people are exaggerating it.

Lenny: Your Codex team does have more role blending. What does that look like?

Andrew Ambrosino: On the Codex team, we do see more role blending than other teams in the company and other industries. Partly because it's a technical product for engineers. So our designers speak the engineers' language, and our PMs speak technical language and can write code.

Internally, we have a way of describing collaboration: today, the overlap between roles is much larger than before. Defining a person is no longer about the boundary of "where design ends and engineering begins," but rather about the average distribution of all their work.

If you lay out everything someone on the design team does, it might include a lot of coding, and a lot of product-related work. But if you take the "average" of all that work, they still end up somewhere more skewed toward design.

Lenny: You mentioned that a PM's work is more like zone defense. What exactly do you mean?

Andrew Ambrosino: If two PMs are collaborating too closely, that's usually not a good sign. You should instead look at the team like a force-directed layout: where are the gaps? Where is no one covering?

In today's world, curation, guidance, and alignment have become the most important work. Everyone is constantly throwing out ideas; the environment is full of noise. The old top-down, annual planning approach doesn't work anymore. We need someone to be the gatekeeper of taste, to guide something from concept to product, and that means you have to cover every corner of the company.

So you need to spread the team out and look at who is good at what. Keep some distance between them, make sure coverage is comprehensive. Then fill the gaps, like "we want to hire an engineer with strong product thinking." We don't want a situation where a group writes a ton of code first, and then the entire product team has to review and calibrate for product consistency. We want everyone to have these capabilities, just with different areas of deep specialization.

Lenny: So the most valuable person now is someone who can drive something from idea to completion and has the taste to know "this is awesome"?

Andrew Ambrosino: Yes, I think that's the core change. It also reflects my understanding of the IC vs. manager relationship. It's not that management will disappear, or that everyone is just an IC. It's that now everyone, to some extent, holds both roles simultaneously.

If you're an IC, you're no longer typing code character by character. You're actually managing something—managing agents, managing work that is organized to accomplish a task collectively. If you're a team manager, you're doing essentially the same thing, just at a different granularity.

People I usually look for, besides solid professional skills, must have taste. Because in a world of "infinite tokens," we can't produce garbage. You have to have the ability to distinguish signal from noise in a sea of content.

Every time someone asks how many people are on the Codex team, I say, "Probably between 10 and a few thousand." It sounds like a joke, but in reality, everyone's work converges into this product—model research, browser use, model personality, frontend infrastructure, user experience—all of these are part of the product.

But at the same time, we're not receiving thousands of PRs every day. The team has a double-digit number of engineers, about half as many designers, plus a few PMs. The vast majority are ICs. The team's impact is large, but the management layer is not thick. There are many people here who have founded companies before, many who work with a "founder mentality" in a big company, and many with excellent taste.

The entire Codex application has been shaped by the dogfooding loop. We all share a common desire to do as much of our own work as possible inside the application, even if it's not the best tool yet—because only then can it eventually become the best tool. We often deliberately don't improve certain internal processes, letting the product itself get better so it can support those processes. This is actually a very uncomfortable state. But week after week, it keeps changing.

If Codex had launched three months earlier, it would have died. The only difference was model improvement.

Lenny: With things changing so fast, how do you plan? How far ahead do you look?

Andrew Ambrosino: We don't have revolutionary planning approaches. The basic principle is: the closer to the present, the more specific the plan needs to be. It's not that we don't make nine-month plans—it's that those plans must stay very vague. Because any precision you add to a nine-month plan is false precision, just wasting time.

On the application product side, what you plan in November might still be correct in December, but by February it's completely wrong.

At my previous company, when we started driving feature development based on model capabilities, the existing product process basically broke down. It turned into listing all directions we were interested in, prototyping them, judging which ones were feasible now, and shelving the rest. Whenever there was a new leap in model capability, we'd pull those shelved things out and try again. Because whether a feature is ultimately good enough often depends not on the feature's form itself, but on how smart the model is. Many people have been unhappy with my planning style. But it's really hard.

Lenny: Is there a specific example of how important timing is?

Andrew Ambrosino: There's a great story about Codex. I'm very sure that the Codex app we launched in February, if it had been ready in November and we launched it then, would have completely failed in the market. The only difference was that the model improved between November and February. Same product, identical form, completely different outcome, just a few months apart.

Lenny: So something that doesn't work now might work later? Keep bigger ambitions?

Andrew Ambrosino: Yes. I recommend people not to judge too quickly that "this thing doesn't work now, so it's a bad feature." It might just not be the right time yet.

Going back to Codex's initial launch, Codex web. Basically you give the model a task, it goes and does it, and comes back with results. Doesn't sound drastic, but the problem was it wasn't good enough—the form was too ahead of its time.

Then Claude Code came out—fully local, no cloud connection, not pretending to be super AGI. It asks you questions, waits there, you can't delegate your entire life to it. It was much more usable because it matched the model's capability level at that time.

We were too "AGI" back then. I often think about that lesson. In the past, a product's failure in the market told you a lot about the product form or messaging. Now it's different. You might need to launch the same thing six times until it succeeds, and the form might not change at all.

The in-app browser is another case. We once had a working version. Back in the Atlas era, we already had agents executing tasks in the browser. Before that, there was Operator in ChatGPT—that didn't succeed. But if you string together Operator, Atlas, Codex, and ChatGPT, you can see a continuous evolution line. Essentially the same feature, just relaunched repeatedly as intelligence levels changed, and the results changed completely.

Once a product or feature exists, people naturally focus on all kinds of detail issues and micro-optimizations, and they should. But that's also why we always preserve a bottom-up culture of exploration. Because sometimes, just as the Codex app once disrupted ChatGPT in a certain way, Codex itself may be disrupted by new experiments in the future. You can't expect the same team to continuously produce disruptive innovations while also always focusing on product quality and detail polishing. At some point, you have to design a mechanism where both capabilities can coexist.

Lenny: What is the vision for Codex? Where are you taking it?

Andrew Ambrosino: In January and February of this year, during internal dogfooding testing, we saw a clear PMF in engineering and research workflows. But at the same time, people in marketing, comms, finance, and legal were all using Codex—even though the app wasn't friendly to them. It would show them code, ask them to approve command-line search tool execution.

We tried adding Codex's capabilities to other products: ChatGPT desktop app, Atlas browser. And the most annoying thing happened: no one wanted to leave the Codex app to use those supposedly "designed for them" products.

The lesson for us is that between developer tools and general knowledge work tools, there are many subtle commonalities. We genuinely believe that the product form we're building is the right form for hosting various deep vertical scenarios. Start simple, then gradually add complexity as needed.

It's not a product like "draw a rectangle on the screen, and everything must happen inside it." More like a home base—you start here, finish here, manage automations here, and it calls all the tools you need. Some people call this a "super app." I wish they hadn't, because now I hear that word almost every day.

Dan Shipper had an interesting idea: in the future, we'll use SaaS tools "from the inside out" within Codex. Notion, Linear, Salesforce—you don't open them in a browser; agents operate them inside Codex. And we're already doing that: in-app browser, Chrome extension, computer use—all ways for Codex to interact with external tools.

The best example: our internal video producer Brent used Codex to edit the Codex launch video. Codex is not a video editor—it doesn't have those UI elements. But it understood Brent was using Premiere Pro, and could make modifications by editing the files behind Premiere Pro. When it realized it couldn't do everything, it wrote its own Premiere Pro extension, installed it into Premiere Pro, and then communicated with Premiere Pro through that extension. We were blown away when we saw that.

That's a good pattern: professional tools do professional things. Codex doesn't need to become a better video editor; it just needs to interact seamlessly with professional tools.

Knowing how to code isn't important anymore—knowing how to delete code is.

Lenny: From handwritten code to AI writing 100% of the code, to agents and loops. How do the most cutting-edge teams work now?

Andrew Ambrosino: Loops? That was last week.

We've been discussing the question "what percentage of the product is code written by AI?" By last year's standards, 100% of our code is now AI-written. So the question is no longer "how much is AI-written," but "is the code written under supervision or unsupervised?" That's a completely different dimension. I welcome this constant resetting of standards, because it means we're making product progress.

We've explored a lot in autonomous software development, like a lot of harness engineering—"what if the model automatically does garbage collection and cleanup of the codebase at night?"

But all current models have a problem: they always add complexity. If anyone in research is listening: please teach models to delete code. When you try to fully hand over development to autopilot, this becomes a serious problem—both at the human level and at the codebase level.

Same with feature requests. How do you teach a model to judge which features are worth building, which should be ignored, which should be merged and redefined? And how do you teach a model to build the right abstractions?

These capabilities are improving. But I don't think we've reached the stage where you set up a loop for the model to automatically "improve the app," continuously monitoring Twitter, Slack, and email, and then autonomously iterating. Although, we are indeed trying to make that a reality.

Lenny: Do you think we'll get there? Where you set a goal: "win"?

Andrew Ambrosino: "/goal generate me a billion dollars." I don't know. I won't say never or ever.

Lenny: As a product and engineering lead, how do you personally use AI to work?

Andrew Ambrosino: I think I might have the best job in the world right now.

When I started on Codex, my personal goal was to make it good enough that I could use it to write Codex's code. That was a super tight dogfooding loop. I couldn't do something, so I'd fix it, then I could do it, then I could do more things.

Then my role changed. I needed to do more product discovery, figure out what the team was doing, correct things that were off track. So Codex became my tool for those things: "Help me build a spreadsheet to organize this data." "Help me do an internal deep dive to see what explorations have been done in this area before."

The May series of releases—in-app browser, computer use, artifact creation—was the first time we used vibe coordination to manage a launch. I had a Notion doc tracking all to-dos, and then used Codex to automatically collect progress from PRs and Slack channels, update the status tracker. At the time, I thought that was the cutting edge of managing product launches.

Now, every morning I read a daily report generated by Codex, which filters from the 3000 Slack channels I'm in, picking out things I need to pay attention to. I can reply with "give me five questions and I'll answer them." It self-adjusts. I say "next time you run, pay less attention to this workflow" or "this happened but didn't appear in the report, make sure to catch it next time." It updates its notification style and adjusts its focus.

Lenny: How is that set up? What's the workflow?

Andrew Ambrosino: We're still in the discovery phase. It's just a scheduled task: "Go through my Slack channels, these are the things I care about, organize by these categories, here's some context." The first few runs might need guidance. The good thing is I don't need to find how to edit instructions—I just say in conversation "next time help me change this," and it updates.

But I think this is also the core issue with the chatbot form factor. I know how to set it up because for me it's product discovery. But if you're not at OpenAI working on building this, you don't want to figure that out. We need to figure out how to make this work for normal people.

Lenny: I also used Codex to build an automation for filtering spam. One step required setting up a bunch of API triggers in the Google Cloud Console, and that interface was really annoying. So I had Codex do it—it took over my computer and used computer use to operate it.

Andrew Ambrosino: It was like: "I don't care if you have a connector or not, buddy, I'm just going to start clicking."

Figuring out the boundaries between connectors, in-app browser, Chrome extension, and computer use is really interesting. Often these boundaries are figured out by feel.

I find these personal workflows especially interesting. Everyone is experimenting with all sorts of things, and each person builds a completely different system. But slowly, common patterns emerge. Then we realize: "This should become a first-class experience in the product."

For example, memory. Many people are setting up Obsidian knowledge bases or Notion spaces to build their own mind palaces. You shouldn't have to do that yourself—there should be a sufficiently generic memory function that does it for you. We're constantly filtering what works for individuals but should stay at the individual level, and what should enter the product as a foundational component.

Lenny: From the outside, people see you winning. But there must be things that haven't succeeded?

Andrew Ambrosino: Hearing you describe it that way is funny. This is actually the first time I feel like I'm not failing.

I spent many years running startups, and eventually I basically dismantled the company and sold it off. Working in a highly regulated industry, the whole process felt like constant failure. Then I went to another startup, building AI tools in another closed, regulated industry—one failure after another. I actually failed a lot. Sometimes it's just a point in time where skills, passion, and market window happen to align.

Even now, in this project combining Codex and ChatGPT, there are countless small failures. We say "it should look like this," post it in Slack, and immediately get 2000 messages telling us how stupid we are. That's what I love about OpenAI—people tell us directly, they're ruthless about internal product failures, which is why external products are good. I failed for about 10 to 15 years before reaching this point. So I'm still a bit surprised every day that things are going well.

Lenny: Any final advice for readers?

Andrew Ambrosino: Don't "marry" your current workflow. What you should truly hold onto are the outcomes that only you can uniquely deliver. Then, keep trying to change your process. If the skill you're most proud of is "I know Figma's auto layout best," then what are you doing? AI will become better than you at that too. Find something worth doing, and then find a way to do those things.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateStocksTransferLive
546.21K Popularity
#
CirclePlunges17%
4.17M Popularity
#
PredictWorldCup🇵🇹vs🇭🇷
168.89K Popularity
#
GateCardPointsSystemLaunched
122.66K Popularity
#
NFPCountdown
921.95K Popularity

Pinned

Sitemap

Codex Lead: "Everyone is a builder" is a terrible idea.

Trending Topics

GateStocksTransferLive

CirclePlunges17%

PredictWorldCup🇵🇹vs🇭🇷

GateCardPointsSystemLaunched

NFPCountdown

Pinned