DeepMind Founder Interview: AGI Architecture, Agent Status, and Scientific Breakthroughs in the Next Decade

Original Video Title: Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Original Source: Y Combinator
Original Compilation: Deep Tide TechFlow

Editorial Introduction

Google DeepMind CEO, Nobel Prize in Chemistry winner Demis Hassabis visits Y Combinator to discuss key advancements toward AGI, advice for entrepreneurs on maintaining a lead, and where the next major scientific breakthrough might occur.

A very practical judgment for deep tech entrepreneurs is that if you start a ten-year deep tech project today, you must include the emergence of AGI in your planning. He also revealed that Isomorphic Labs (a biotech AI spin-off from DeepMind) will have major news soon.

Key Quotes

AGI Roadmap and Timeline

· “Almost certainly, these existing technological components will become part of the final AGI architecture.”

· “Problems like continual learning, long-term reasoning, and certain aspects of memory haven’t been solved yet; AGI needs to get all of that right.”

· “If your AGI timeline is around 2030, like mine, and you’re starting a deep tech project today, you must consider that AGI might appear midway.”

Memory and Context Windows

· “The context window roughly corresponds to working memory. Humans have an average of about seven items in working memory, but we have context windows of hundreds of thousands or even tens of millions of tokens. The problem is, we stuff everything into it, including unimportant or incorrect information, which is quite crude.”

· “Processing real-time video streams and storing all tokens would mean a million tokens are only enough for about 20 minutes.”

Limitations of Reasoning

· “I like playing chess with Gemini. Sometimes it realizes it’s making a bad move but can’t find a better one, so it circles around and ends up making that bad move. A precise reasoning system shouldn’t have this problem.”

· “It can solve IMO gold medal-level problems on one hand, but ask it in a different way and it makes elementary math mistakes. It seems to lack something in self-reflection on its own thinking process.”

Agent and Creativity

· “To achieve AGI, you need a system that can proactively solve problems for you. Agents are the way forward, and I think we’re just getting started.”

· “I haven’t seen anyone use vibe coding to create a top-ranked AAA game. With the current effort, it should be possible, but it hasn’t happened yet. It indicates there’s still something missing in tools or workflows.”

Distillation and Small Models

· “Our hypothesis is that a cutting-edge Pro model released for half a year to a year can be compressed into a very small model that runs on edge devices. We haven’t yet hit the theoretical information density limit.”

Scientific Discovery and the “Einstein Test”

· “Sometimes I call it the ‘Einstein Test’—can you train a system with knowledge from 1901 and have it independently derive Einstein’s 1905 results, including special relativity? If it can, these systems are not far from inventing truly new things.”

· “Solving a Millennium Prize problem is already impressive. But even more difficult is proposing a new set of Millennium Prize problems that top mathematicians consider equally profound and worth a lifetime of research.”

Deep Tech Entrepreneurship Advice

· “Chasing hard problems and simple problems are quite similar, just approached differently. Life is short; better to focus your energy on things that no one else will do if you don’t.”

Pathways to AGI

Gary Tan: You’ve been thinking about AGI longer than almost anyone. Based on current paradigms, how much of the final AGI architecture do you think we already have? What is fundamentally missing right now?

Demis Hassabis: Large-scale pretraining, RLHF, chain-of-thought, I’m quite sure they will be part of the final AGI architecture. These techniques have proven a lot already. I can’t imagine in two years we’ll find they’re dead ends—that doesn’t make sense to me. But on top of what we have, maybe one or two more things are needed. Continual learning, long-term reasoning, certain aspects of memory still have unresolved issues.

AGI needs to be fully solved. Maybe existing tech plus some incremental innovations can get us there, but there might still be one or two critical breakthroughs needed. I’d say the probability of unresolved key issues is about 50/50. So at DeepMind, we’re pushing both lines.

Gary Tan: I deal with many agent systems, and what shocks me most is that the underlying weights are often the same across different runs. So the concept of continual learning is very interesting because right now we’re basically patching things together with tape, like those “dream cycle” ideas.

Demis Hassabis: Exactly, those dream cycles are pretty cool. We’ve thought about this in the context of integrating episodic memory. My PhD research was on how the hippocampus elegantly integrates new knowledge into existing schemas. The brain does this extremely well.

It does this during sleep, especially during REM sleep, replaying important experiences to learn from them. Our earliest Atari program, DQN (DeepMind’s 2013 deep Q-network that first used deep reinforcement learning to reach human-level performance on Atari games), mastered Atari by experience replay.

This concept, learned from neuroscience—replaying successful paths repeatedly. That was 2013, quite ancient in AI terms, but it was crucial then.

I agree, we’re basically patching things together now—stuffing everything into context windows. It feels wrong. Even if we’re building machines instead of biological brains, theoretically, we could have millions or tens of millions of tokens in context, and perfect memory, but retrieval costs still exist. Finding truly relevant information at the moment of decision isn’t easy, even if you can store everything. So I believe there’s huge room for innovation in memory systems.

Gary Tan: Honestly, a million-token context window is already bigger than I expected and can do a lot.

Demis Hassabis: Yes, for most use cases it’s enough. But think of the context window as working memory. Humans have about seven items in working memory, yet we have hundreds of thousands or millions of tokens in context. The problem is, we stuff everything in, including unimportant or incorrect info, which is quite crude. And if you process real-time video streams, naively recording all tokens, a million tokens only cover about 20 minutes. But if you want the system to understand your life over one or two months, that’s still far from enough.

Gary Tan: DeepMind has always invested heavily in reinforcement learning and search. How deeply is this philosophy embedded in your current development of Gemini? Is RL still underestimated?

Demis Hassabis: It might be underestimated. Attention on RL has fluctuated. From day one at DeepMind, we’ve been working on agent systems. All Atari and AlphaGo work essentially belong to RL agents—systems capable of autonomous goal achievement, decision-making, and planning. We chose games because they’re manageable complexity, then moved to more complex ones like AlphaStar after AlphaGo, covering most games we could.

The next step is whether these models can generalize into world models or language models, not just game models. We’ve been working on this for years. Today’s leading models’ reasoning and chain-of-thought are basically a return to what AlphaGo pioneered.

I think much of what we did then is highly relevant today. We’re re-examining those old ideas, scaling up, making them more general, including Monte Carlo tree search and other RL methods. The ideas behind AlphaGo and AlphaZero are highly related to foundational models today, and I believe much progress in the next few years will come from this.

Distillation and Small Models

Gary Tan: To be smarter now, you need bigger models, but distillation tech is also advancing, making small models quite fast. Your Flash models are very strong, reaching about 95% of the performance of the state-of-the-art, but at only a tenth of the cost. Is that right?

Demis Hassabis: I think that’s one of our core advantages. You need to build the largest models first to get cutting-edge capabilities. One of our biggest strengths is quickly distilling and compressing those capabilities into smaller models. We invented the distillation approach ourselves, and we’re still among the world’s top. Plus, we have strong business incentives to do this. We’re probably the largest AI application platform globally.

With AI Overviews, AI Mode, and Gemini, every Google product—Maps, YouTube, etc.—is integrating Gemini or related tech. This involves billions of users and products serving hundreds of millions or billions of users. They need to be extremely fast, efficient, low-cost, and low-latency. This drives us to optimize Flash and smaller Flash-Lite models to be highly efficient, aiming to serve various user needs.

Gary Tan: I’m curious how smart these small models can get. Is there an upper limit to distillation? Can 50B or 400B models match today’s largest front-line models in intelligence?

Demis Hassabis: I don’t think we’ve hit the information-theoretic limit yet—at least, no one knows if we have. Maybe someday we’ll reach a density ceiling, but our current assumption is that a cutting-edge Pro model released today can be compressed into a very small, edge-compatible model within six months to a year.

You can see this in Gemma models—our Gemma 4 performs very strongly at similar sizes. This relies heavily on distillation and efficiency optimization techniques. So I really see no theoretical limit yet; we’re still far from it.

Gary Tan: There’s a very wild phenomenon now—engineers are doing 500 to 1,000 times the work they did six months ago. Some people here are doing work equivalent to what a Google engineer in the 2000s did a thousand times over. Steve Yegge mentioned this.

Demis Hassabis: I find it exciting. Small models have many uses. One is cost and speed—faster iteration, better collaboration. Even if they’re not the absolute cutting edge, say 90-95%, that’s enough, and the speed gains outweigh the small performance gap.

Another big direction is deploying these models on edge devices—not just for efficiency, but for privacy and security. Think of devices handling highly personal data, or robots. For your home robot, you’d want a local, efficient, powerful model, only outsourcing specific tasks to cloud-based large models. Processing audio and video locally, data stays local—that could be the ultimate scenario.

Memory and Reasoning

Gary Tan: Back to context and memory. Currently, models are stateless. If they gain continual learning, what would developer experience look like? How would you guide such models?

Demis Hassabis: That’s a very interesting question. The lack of continual learning is a key bottleneck for current agents to complete full tasks. Today’s agents are useful for local sub-tasks—you can string them together for cool things—but they can’t adapt well to your specific environment. That’s why they can’t truly “launch and forget”; they need to learn your particular context. Solving this is essential for achieving general intelligence.

Gary Tan: How far along are we in reasoning? The current chain-of-thought is strong, but it still makes mistakes that smart undergraduates wouldn’t. What specific improvements do you foresee? What progress do you expect in reasoning?

Demis Hassabis: There’s still a lot of room for innovation in thinking paradigms. What we’re doing is still quite rough and brute-force. Many improvements are possible, like monitoring the reasoning process and intervening mid-thought. I often feel that both our systems and competitors’ systems tend to overthink, falling into loops.

I like observing Gemini playing chess. All leading foundational models are quite poor at chess, which is interesting.

Watching their thought trajectories is valuable because chess is a well-understood domain. I can quickly tell if they’re going off track or if their reasoning is effective. Sometimes they consider a move, realize it’s bad, but can’t find a better one, so they circle back and make that bad move. An exact reasoning system shouldn’t do that.

This huge gap still exists, but fixing it might only require one or two adjustments. That’s why you see the so-called “jagged intelligence”—it can solve IMO gold medal problems but stumble on elementary math when asked differently. It seems to lack something in self-reflection on its own thought process.

True Capabilities of Agents

Gary Tan: Agents are a big topic. Some say it’s hype. I personally think we’re just at the beginning. What’s DeepMind’s honest assessment of agent capabilities? How big is the gap between internal understanding and public hype?

Demis Hassabis: I agree, we’re just starting. To reach AGI, you need a system that can proactively solve problems for you. That’s always been clear to us. Agents are the way, and I think we’re just getting started.

Everyone is exploring how to better integrate agents into workflows. We’ve done a lot of personal experiments, and many here probably have too. How to make agents part of the workflow, not just a toy but truly transformative? We’re still in experimentation. It’s only recently that we’ve started to find particularly valuable scenarios—maybe just in the last two or three months. The technology has just reached a point where it’s no longer a toy demo but genuinely adds value to your time and efficiency.

I often see people launching dozens of agents running for dozens of hours, but I’m not sure if the output justifies the effort.

We haven’t seen anyone use vibe coding to create a AAA game that tops app store charts. I’ve done some myself, and many here have made decent demos. I can now prototype a “Theme Park” in half an hour, whereas at 17 I spent six months on it.

I have a feeling that if you spend an entire summer, you could create something truly incredible. But it still requires craftsmanship, human soul, and taste—you must bring these into whatever product you build. No kid has yet made a blockbuster game selling over ten million copies, but with current tools, it should be possible. Something’s missing—maybe workflow, maybe tools. I expect in the next 6 to 12 months, we’ll see such results.

Gary Tan: To what extent will that be fully automated? I don’t think it will be fully automatic from the start. The more likely path is that people first achieve 1,000x efficiency, then use those tools to make best-selling apps and games, and only later will more steps be automated.

Demis Hassabis: Exactly, that’s what you should expect first.

Gary Tan: Also, some people are already doing this but aren’t willing to publicly admit how much agent help they’re getting.

Demis Hassabis: Maybe. But I want to talk about creativity. I often cite AlphaGo’s famous move 37. I’ve been waiting for that moment to happen, and once it did, I started projects like AlphaFold. We began AlphaFold the day after Seoul’s move 37—ten years ago. I went to Korea to celebrate the tenth anniversary of AlphaGo.

But just making move 37 isn’t enough. It’s cool, very useful. But can this system invent the game of Go itself? If you give it a high-level description—“a game learnable in five minutes, but impossible to master in a lifetime, elegant aesthetically, finished in an afternoon”—and the system returns Go, that’s different. Today’s systems can’t do that. Why?

Gary Tan: Maybe someone here can.

Demis Hassabis: If someone can do it, then the answer isn’t that the system is missing something, but that we’re using it wrong. Maybe that’s the right answer. Perhaps today’s systems already have that capability, just needing a highly talented creator to drive it, infusing the project with soul, and working in close harmony with the tools. If you immerse yourself in these tools day and night and have deep creativity, maybe you can create something beyond imagination.

Open Source and Multimodal Models

Gary Tan: Switching topics to open source. The recent release of Gemma allows very powerful models to run locally. What’s your view? Will AI become something users control themselves, rather than mainly staying in the cloud? Will this change who can build products with these models?

Demis Hassabis: We are strong supporters of open source and open science. We fully open-sourced AlphaFold. Our scientific work is still published in top journals. For Gemma, we aim to create world-leading models at similar scale. So far, Gemma has been downloaded about 40 million times in just two and a half weeks.

I also believe it’s important to have Western tech stacks in open source. Chinese open source models are excellent and currently leading in open source, but we believe Gemma is very competitive at similar scale.

A resource issue remains: no one has excess compute to develop two full-scale frontier models simultaneously. Our current decision is to use edge models for Android, glasses, robots, etc., and keep them open. Once deployed on devices, they are exposed, so it’s better to open them fully. We’ve unified open strategies at the nanoscale, which makes strategic sense.

Gary Tan: Before the talk, I demonstrated how I interact with Gemini via voice—an AI operating system. I was nervous showing it, but it worked. Gemini was built from the start as a multimodal system. I’ve used many models, but the deep integration of voice, tool invocation, and contextual understanding in Gemini is unmatched.

Demis Hassabis: Exactly. One underappreciated advantage of Gemini is that it was built from the ground up as a multimodal system. This makes initial development harder than just text, but we believe the long-term benefits are significant, and we’re already seeing them pay off.

For example, in world models, we built Genie (DeepMind’s generative interactive environment model) on top of Gemini. In robotics, Gemini Robotics will be based on multimodal foundation models, creating a competitive moat. We’re also increasingly using Gemini in Waymo (Alphabet’s autonomous driving company).

Imagine a digital assistant that follows you into the real world, perhaps on your phone or glasses, understanding your physical environment. Our system excels here. We’ll continue investing in this area, and I believe our lead in such problems is substantial.

Gary Tan: As reasoning costs rapidly decline, what becomes possible? Will your team’s focus shift because of this?

Demis Hassabis: I’m not sure reasoning will truly become free—Jevons’ paradox (efficiency gains leading to increased total consumption) is real. I think everyone will eventually use all available compute.

Imagine millions of agents collaborating or a small group thinking along multiple paths and integrating results. We’re experimenting with these directions, and all will consume reasoning resources.

In terms of energy, if we solve issues like controlled nuclear fusion, room-temperature superconductors, and optimal batteries, I believe we can get near-zero energy costs through materials science. But physical manufacturing of chips remains a bottleneck for decades. So reasoning quotas will still be necessary, and efficiency remains critical.

The Next Scientific Breakthrough

Gary Tan: Fortunately, small models are getting smarter. Many founders in biotech and related fields are here. AlphaFold 3 has surpassed proteins, extending to broader biomolecules. How far are we from modeling complete cellular systems? Is that a completely different level of difficulty?

Demis Hassabis: Isomorphic Labs is making great progress. AlphaFold is just one step in drug discovery. We’re working on adjacent biochemical research—designing compounds with correct properties—and will have major releases soon.

Our ultimate goal is a full virtual cell—a comprehensive, perturbable cell simulator whose outputs closely match experimental results and have practical use. It can skip many search steps, generate large amounts of synthetic data to train other models, and predict real cell behavior.

I estimate about ten years to a complete virtual cell. We’re starting from the nucleus because it’s a relatively self-contained subsystem. The key is whether we can isolate a complexity-appropriate slice that’s self-contained enough to approximate its inputs and outputs, then focus on that subsystem. The nucleus is a good candidate.

Another challenge is data scarcity. I’ve discussed with top scientists in electron microscopy and imaging tech. If we could image live cells without killing them, that would be revolutionary—turning it into a visual problem we know how to solve.

Currently, no technology can image live cells at nanometer resolution without damaging them. Static images at that resolution are very detailed, exciting but not enough to turn into a visual reasoning problem.

So, two paths: hardware and data-driven solutions, or building better learnable simulators to model these dynamic systems.

Gary Tan: Beyond biology, in materials science, drug discovery, climate modeling, mathematics—if you had to rank, which scientific field will be most transformed in the next five years?

Demis Hassabis: Every field is exciting—that’s why AI has been my passion for over 30 years. I see AI as the ultimate scientific tool for advancing understanding, discovery, medicine, and our grasp of the universe.

Our initial mission was twofold: first, solve intelligence (build AGI); second, use it to solve all other problems. We later refined this to address concerns about overpromising, but that’s still our core belief. We aim to solve “root node” scientific problems—those once broken through, unlock entire new branches of discovery. AlphaFold exemplifies this.

Over three million researchers worldwide now use AlphaFold. I hear from pharma execs that almost every new drug discovery will involve AlphaFold at some stage. We’re proud of this impact, and it’s just the beginning.

I can’t think of a scientific or engineering field where AI can’t help. The areas you mentioned are still in the “AlphaFold moment”—promising, but the big challenges remain. In the next two years, we’ll see progress across materials science, math, and beyond.

Gary Tan: It’s like Promethean—giving humanity a whole new capability.

Demis Hassabis: Exactly. But, as the myth warns, we must be cautious about how this power is used, where it’s applied, and the risks of misuse.

Lessons from Success

Gary Tan: Many here are trying to start companies applying AI to science. In your view, what distinguishes truly cutting-edge startups from those just layering APIs on foundational models and claiming “AI for Science”?

Demis Hassabis: I think if I were in your shoes, looking at YC projects, I’d consider how to anticipate AI’s trajectory. It’s hard, but I believe integrating AI with other deep tech fields—materials, medicine, atomic sciences—has huge potential. Especially in the foreseeable future, there are no shortcuts in these complex domains. They won’t be overtaken by the next model update. If you want a resilient direction, that’s what I’d recommend.

I’ve always favored deep tech. Real, lasting value comes from hard problems. I’ve been attracted to it since 2010 when we started. Back then, people told me AI was a dead end; academia thought it was a niche that failed in the 90s.

But if you believe in your ideas—why this time is different, your unique background—ideally, you’re an expert in machine learning and applications, or can assemble such a founding team. That’s where enormous impact and value can be created.

Gary Tan: That’s valuable insight. Achieving one thing often seems obvious afterward, but everyone was against you before.

Demis Hassabis: Exactly. So you must pursue what you’re truly passionate about. For me, I’ll keep doing AI no matter what. Since I was young, I knew it was the most impactful thing I could do. It’s proven right so far, though maybe I was early by 50 years.

And it’s also the most interesting thing I can think of. Even if today we’re still in a garage and AI isn’t fully realized, I’d find ways to keep going. Maybe back to academia, but I’d keep pushing forward.

Gary Tan: AlphaFold is a good example of a direction you pursued and bet right on. What makes a scientific field ripe for breakthroughs like AlphaFold? Are there patterns, like certain objective functions?

Demis Hassabis: I should probably write this down someday. From AlphaGo and AlphaFold, I’ve learned that current techniques work best when:

  • The problem has a huge combinatorial search space—bigger is better, too large for brute force or special algorithms to solve. Both Go’s move space and protein conformations far exceed the number of atoms in the universe.

  • The objective function is well-defined, like minimizing free energy in proteins or winning in Go, allowing gradient-based optimization.

  • There’s enough data, or a simulator that can generate large amounts of synthetic data within the distribution.

If these conditions are met, current methods can go far—finding that needle in the haystack. Drug discovery follows the same logic: if a compound can treat a disease without side effects, and physics allows it, the challenge is how to find it efficiently. AlphaFold proved that such systems can search vast spaces effectively.

Gary Tan: I want to elevate this. We’re talking about humans creating AlphaFold, but also about humans exploring hypothesis spaces with AI. How close are we to AI systems doing genuine scientific reasoning—not just pattern matching in data?

Demis Hassabis: I think it’s very close. We’re building general systems—like AI co-scientists, AlphaEvolve algorithms—that go beyond current models. All top labs are exploring this.

But so far, I haven’t seen a truly major scientific discovery made solely by these systems. I believe it’s coming soon. It might relate to creativity, breaking known boundaries. At that level, it’s no longer pattern matching; it’s about analogy reasoning. These systems currently lack that, or we’re not using them correctly.

A standard I often mention in science is: can they propose a genuinely interesting hypothesis, not just verify one? Verifying hypotheses can be huge—like proving the Riemann Hypothesis or solving a Millennium Prize problem—but maybe we’re only a few years away from systems that do that.

Even harder is proposing a new set of Millennium Prize problems, considered equally profound by top mathematicians, worth a lifetime of study. I think that’s an order of magnitude more difficult. We don’t yet know how to do it, but I believe these systems will eventually. Maybe they just need one or two more things.

A test I call the “Einstein Test”: can you train a system with 1901 knowledge and have it independently derive Einstein’s 1905 results, including special relativity? I think we should run this test repeatedly. Once it’s possible, these systems are not far from inventing truly new things.

Entrepreneurship Advice

Gary Tan: One last question. Many here have deep technical backgrounds and want to build something at your scale. You’re one of the world’s leading AI research organizations. From your front-line experience with AGI research, what’s something you now know but wish you knew at 25?

Demis Hassabis: We’ve touched on part of this. You’ll find that chasing hard problems and simple problems are quite similar—just approached differently. Different challenges, different difficulties. Life is short; better to focus your energy on things that no one else will do if you don’t.

Also, I believe cross-disciplinary combinations will become more common in the next few years. AI will make crossing fields easier.

Finally, it depends on your AGI timeline. Mine is around 2030. If you start a deep tech project today, it’s usually a ten-year journey. You must include the possibility of AGI emerging along the way. What does that mean? Not necessarily bad, but you need to plan for it. Can your project leverage AGI? How will AGI systems interact with your project?

Going back to the earlier discussion about AlphaFold and general AI systems, I foresee a scenario where systems like Gemini, Claude, or similar general models call upon specialized systems like AlphaFold as tools. I don’t think we’ll put everything into a single monolithic system.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin