Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
From the Stone Age to the Renaissance: The Technological Breakthroughs and Product Insights Behind OpenAI Image Generation 2.0
Writing: Techub News Compilation
This is the 19th episode of OpenAI’s official podcast. Host Andrew Mayne and researchers Kenji Hata and product lead Adele Li engaged in an in-depth conversation about GPT Image 2.0 (also known as ImageGen 2.0). This discussion took place about two weeks after the model’s official launch—at that time, over 1.5 billion images had been generated weekly, with multiple usage trends rapidly gaining popularity worldwide. This is not just a product launch review but an honest discussion about the paradigm shift in image generation technology.
From Investor to Product Lead: A Story of Role Transition
Before joining OpenAI, Adele Li’s entire career was in investment. She worked at private equity firms and Redpoint Ventures, focusing on early investments in AI and software. When she first joined OpenAI, her initial responsibility was planning data and computing infrastructure, which was far from image generation. However, over the past six months, she gradually shifted to the product side, taking full charge of ImageGen’s product development.
She admits that the essence of a product manager’s role is “doing what needs to be done,” regardless of what that is. The ImageGen project especially allowed her to mobilize multiple skills—working closely with researchers like Kenji and constantly thinking about market gaps and opportunity windows.
“This is no longer the market from a year ago when ImageGen 1.0 was released,” Adele said. Today, there are multiple competitors in the image generation space, and ChatGPT itself has become a completely different product. Against this backdrop, contemplating the evolving role of ImageGen within the ChatGPT ecosystem is one of her most interesting thoughts.
Kenji Hata also joined OpenAI about two years ago. He initially worked on an audio-related project, then by chance participated in the pre-release work of ImageGen 1.0, and gradually transitioned to full-time research in image generation, culminating in version 2.0.
Data Speaks First: Two Weeks Post-Launch, 1.5 Billion Images Weekly
Within two weeks of GPT Image 2.0’s official launch, the usage of image generation on ChatGPT increased by over 50%, with weekly image outputs surpassing 1.5 billion. Meanwhile, various usage trends spread rapidly worldwide—from Asian users’ enthusiasm for color analysis and sticker styles to American users’ preference for crayon and graffiti styles.
Adele believes this viral spread itself indicates a key point: users almost instantly perceive the leap in model capability. “Visual communication feedback is the most direct,” she said. Users don’t need to read technical reports; just open the model and generate an image—whether it’s good or not, it’s obvious at a glance.
Host Andrew also shared the same feeling—the magnitude of this capability enhancement makes him think that rather than calling it “2.0,” it should be considered a whole new paradigm. So, how exactly did this paradigm shift happen?
Three Core Breakthroughs: Text, Multilingual Support, and Realism
Adele and Kenji attribute the leap in ImageGen 2.0’s capabilities to synchronized breakthroughs across several key dimensions.
First is text rendering ability. Early image generation models were almost disastrous at handling text within images—letters distorted, words jumbled, layouts chaotic. Andrew jokingly said that the “OpenAI” text generated by early DALL-E looked like it was written by a chimpanzee. Now, the model can clearly and accurately display large blocks of text within images, even complex infographics.
Kenji quantified this progress with an internal test: asking the model to generate a grid image containing 100 random objects, then counting the correct recognitions. From 5-8 objects in DALL-E 3 era, to about 16 in ImageGen 1.0, then stabilizing at 25-36 in version 1.5, and now nearly 100 objects correctly in 2.0. “This isn’t a sudden leap but a steady, continuous growth,” Kenji said.
Second is multilingual support. The team specifically enhanced the model’s understanding and generation capabilities across multiple languages during training. Post-launch feedback from Asian and European users confirmed this direction’s correctness—users in different language environments can obtain high-quality localized images.
Third is photorealistic realism. This was one of the most common pain points in user feedback: previous models often produced character images with an “overly glamorous magazine cover” feel—distorted facial and body proportions, lacking authenticity. Version 2.0 made significant improvements here, aiming to make images “look more like you.” Kenji recalled his first impression when seeing checkpoint outputs of the new model: comparing it side-by-side with ImageGen 1.0, the difference was obvious.
He described a scene of a woman standing by the sea gazing out. “We looked at both images, said nothing. Just… okay, this one wins.”
How to Balance Speed and Quality? The Key in Post-Training
Andrew raised a question many are curious about: the model has become smarter, but the generation speed hasn’t slowed—how is that achieved?
Kenji explained that each version accumulated a lot of engineering learnings. For example, they worked extensively to improve the model’s “token efficiency”—producing higher-quality images with fewer tokens. This is a continuous optimization process across iterations, not a single breakthrough.
Adele added the importance of the post-training phase. She said that during training, the team not only aimed to help the model understand world knowledge—how science, concepts, and math are represented visually—but also answered a more subjective question: what is “good-looking”? What is “tasteful”?
These questions have no standard answers but directly determine the upper limit of the model’s output quality. To address this, the team collaborated closely with artists, designers, and marketers, distilling their aesthetic judgments and best practices into the way the model interacts with users.
They also closely monitor social media feedback, incorporating real-world usage issues into iterative improvements. Kenji said these feedbacks are either alleviated or thoroughly fixed in subsequent versions.
Viral Trends Behind: Using AI to Express “Imperfect” Self
Among the emerging usage trends after launch, one surprised and amused the team: users intentionally generate rough, crude “Microsoft Paint style” images—downgrading celebrity photos or popular images into pixelated doodles.
Adele offered an insightful interpretation: “Making AI generate ‘imperfect’ things actually requires high intelligence.” This isn’t a model failure; quite the opposite, it’s a reflection of the model truly understanding user intent.
She believes this reveals a consumer psychology trend: people crave authenticity, imperfection, and nostalgia. Crayon styles, graffiti, retro pixel art—these trending prompts all point to the same theme: users want to use AI to showcase a more genuine, fun side of themselves, not just pursue “perfect output.”
“Expressing oneself through AI is a direction we’re genuinely excited about,” Adele said. This aligns closely with OpenAI’s mission—to enable more people to express that “self” which was previously impossible to articulate.
From Entertainment to Productivity: Education, Design, and Cross-Industry Penetration
Another significant shift with ImageGen 2.0 is its transition from entertainment-focused scenarios to genuine productivity tools.
In education, the team has a dedicated internal beta channel for educators, covering from elementary to graduate levels. Kenji shared a memorable case: a biology professor input graduate-level textbook content, generating highly accurate diagrams, and confirmed the content was entirely correct.
Adele believes transforming complex concepts into easily understandable visual content is one of the model’s strongest capabilities. She especially mentioned “personalized learning”—teachers can use ImageGen to generate customized learning materials for students with different languages and preferences. This is an active area of exploration for her and her team: how to deeply integrate ImageGen into ChatGPT’s learning scenarios, making concept teaching naturally accompanied by visual presentation.
In workplace scenarios, Adele revealed an interesting internal statistic: over 50% of slides in OpenAI’s internal demos already use images generated by ImageGen. “The speed of visual communication penetration is much faster than we expected.”
She also listed various professional groups already using ImageGen: real estate agents generating property showcase images and virtual renovation effects, YouTube creators making video thumbnails and promotional materials, artists connecting with fans, writers quickly generating social media images…
Andrew also shared his personal experience: he uploaded his book cover to the model, which generated promotional images suitable for different social platforms—correct proportions and style on the first try. “It feels like magic.”
360-Degree Panoramas, Spirit Images, and Codex Collaboration: Surprising Emergent Abilities
Beyond expected capability improvements, version 2.0 also brought some “emergent abilities” that the team didn’t fully anticipate.
One example is 360-degree panoramas. The team found that as the model supported arbitrary aspect ratio generation, users began spontaneously creating ultra-wide panoramas and even 360-degree immersive images. They turned this into a product feature, allowing users to generate and interactively browse 360-degree panoramas directly on ChatGPT’s web and mobile interfaces. Andrew immediately used it to generate a “dog playing poker” 360-degree scene from the dog’s perspective, looking around.
Sprite sheets also became an unexpectedly popular use case. Game developers and indie creators use ImageGen to generate multi-pose sprite images of characters, which, combined with Codex’s code generation ability, can build a complete game with custom characters from scratch. Andrew described witnessing this process: saying “I want a crow” in Codex, then watching the system automatically call ImageGen tools to generate the crow’s sprite images, and then Codex integrating them into game code. “It’s magic.”
Multi-image consistency is another significant improvement in 2.0. Kenji mentioned that some users are attempting to create 10-page coherent comic stories, with character images and visual styles highly consistent across images. This capability previously required manual intervention and skill; now, it has become more reliable and seamless.
Next Steps: Creative Agents and Personalized Visual Assistants
Looking ahead, Adele shared a clear vision: Creative Agents.
She described a future where an AI assistant truly understands your working style, aesthetic preferences, and goals, acting as your personal interior designer, architect, wedding planner—all reflected in a single image.
The core of this direction is embedding “personalization” into every aspect of image generation. Adele cited her own “me-me-me eval”—using 100 photos of herself, friends, and family as a test set to see if the model can naturally incorporate personalized elements into generated images. For example, if ChatGPT remembers she has a brother or what her parents like to do, can the model seamlessly embed this info into birthday cards?
Kenji added from a research perspective that the team is still working on optimizing multi-image consistency, overall visual creation experience, and making it easier and faster for users to get desired outputs. “It’s not perfect today, but we know where we’re headed.”
Regarding prompt engineering, both offered their tips. Adele recommends trying the “ImageGen thinking mode”—in Pro or thinking models, ImageGen can connect to the web, analyze files, and call tools, greatly improving quality and composition. She suggests using open-ended prompts in this mode, allowing the model to explore and reason, while anchoring it with a clear aesthetic style. Kenji prefers a minimalist style; he explicitly instructs the model to “keep it clean and simple.”
If DALL-E is the Stone Age of image generation, then ImageGen 2.0 is its Renaissance—an integration of science, art, architecture, knowledge, and aesthetics. In closing this conversation, Adele summarized it with a phrase that perhaps best captures the essence of this model: it is no longer just a “drawing tool,” but a truly emerging visual intelligence that begins to understand the world, people, and beauty.