GPT-5.4 Core Insider Leak Spoiler! Possibly with Permanent Memory, Extreme Reasoning Surge

If you think the AI community has been a bit quiet lately and not very exciting, it might just be the calm before the storm.

According to multiple sources, GPT-5.4 is already on the horizon!

There are already sightings of GPT-5.4 on LMArena.

Just now, foreign media The Information leaked a lot of core details about GPT-5.4.

The news is quite explosive: it has a longer context window, more extreme reasoning models, and possibly even permanent memory!

If the rumors are true, then this generation of models is likely not just a simple upgrade but a significant leap in capability.

GPT-5.4 Early Testing, Code Leak Exposes Breakthroughs

Recently, GPT-5.4 has been leaking everywhere.

From Codex error logs, GitHub PRs, to screenshots accidentally sent by employees, in just a few days, GPT-5.4 has been "exposed" at least three times.

And all of this was accidentally leaked by OpenAI themselves.

The earliest was when developer Corey Noles triggered a security restriction while using OpenAI Codex, and the error logs showed a very long model name:

The most critical part of this string is the first three characters—gpt-5.4.

In short, this string can be considered an internal deployment ID of OpenAI, translating to "a real deployed, testing version of GPT-5.4."

Then, in OpenAI’s official Codex repository, two Pull Requests appeared:

One PR states:

And the other PR is more direct:

It looks like OpenAI added a "Fast Mode" switch for GPT-5.4. Hours later, both PRs were force-pushed and deleted.

Next, a more dramatic scene unfolded: an employee of the OpenAI Codex team, Tibo, posted a screenshot on social media. The model selector clearly shows GPT-5.4.

Not long after, the post was deleted.

Subsequently, another developer reported seeing similar model strings in Codex error messages.

This further confirms that GPT-5.4 has been deployed on internal servers and is undergoing real A/B testing.

One Prompt, Generates 6,000 Lines of Code?

Moreover, some developers have reported a noticeable change—speed!

Some testers say inference is significantly faster, code generation is longer, and a single prompt can produce over 6,000 lines of code!

This was almost impossible before.

Some also discovered a new feature—Fast Mode.

This might mean OpenAI is experimenting with new inference architectures, such as multi-level latency pipelines or different speed tiers of models.

Additionally, new interface features have been spotted: some users report seeing like/dislike buttons next to the Chain-of-Thought summaries, which could indicate their account has been assigned to the GPT-5.4 testing model.

Extreme Reasoning Mode, Performance Explodes

The leak from The Information aligns perfectly with the above information.

The most notable feature in this leak is the Extreme Reasoning Mode.

Traditional models have limited thinking time, but this mode pushes performance to the max—when faced with difficult problems, the model can spend more time, utilize more computing resources, and perform deeper reasoning.

Interestingly, investigations show that many ordinary ChatGPT users are not very interested in reasoning capabilities.

From a commercial perspective, this feature isn’t very practical because companies want AI to give quick answers.

Therefore, OpenAI’s continued focus on reasoning seems driven by pure research motivation.

However, this news is a major boon for scientific research and some enterprise clients.

Clearly, in scientific research, many users are willing to run models for hours or even days on valuable research questions.

Meanwhile, some companies will need GPT-5.4’s enhanced reasoning and long-term task performance to build AI agents capable of automating complex workflows.

According to various leaks, this extreme reasoning mode will be very intense, raising everyone’s expectations.

Context Length Doubled to 1 Million Tokens

For GPT-5.4, this is another hardcore upgrade—the context window has increased from 400,000 tokens to one million tokens.

This means GPT-5.4’s context window is now more than twice that of GPT-5.2. (Of course, some OpenAI models like GPT-4.1 support 1 million tokens, but GPT-5.2 does not.)

Now, GPT-5.4 can directly handle tens of thousands of words of documents, analyze entire books, or long codebases and datasets.

This finally levels the playing field with Google’s Gemini and Anthropic’s Claude, which already support 1 million tokens.

In fact, there are even more exaggerated rumors than The Information: a 2 million token context window!

GPT-5.4 Excels at "Long Tasks"

Another point from the leaks is that GPT-5.4 performs better on tasks that require hours of continuous work.

That is, it can better remember user requests and its own permitted actions across multiple steps, and is less prone to errors.

This is especially helpful for OpenAI’s Codex programming tools, which rely on AI to automate complex, long-duration tasks.

Moreover, this long-term task capability is crucial for AI agents.

Agents can read requirements, research, write code, debug, all without needing human prompts at every step.

GPT-5.4 Might Have Permanent Memory?

Next, the craziest rumor: GPT-5.4 might have permanent memory!

After an engineer posted this leak on X, it caused a stir in the AI community, with Silicon Valley investor and YC founder Garry Tan quickly sharing.

In this post, the leaker described GPT-5.4’s "persistent state."

Jeff Dean mentioned this during the latentspacepod podcast, indicating that major AI labs are exploring this direction.

Some speculate that OpenAI has already discovered how to effectively combine state-space models (SSM) with Transformers at scale.

The key is that SSMs are designed to pass hidden states continuously at each step, with linear computational complexity, unlike Transformers which grow quadratically with context length.

This aligns with a rumor that GPT-5.4 might support a 2 million token context window.

Persistent state essentially means AI models would shift from short-term memory like Guy Pearce’s character in Memento to the stable, long-lasting memory of Dustin Hoffman’s character in Rain Man.

In other words, it would give AI models true long-term memory capabilities.

If realized, this would be a major technological breakthrough!

Under Pressure, OpenAI Forced to "Monthly Updates"

Clearly, after GPT-5, OpenAI has made a noticeable shift—models are now updated monthly at high frequency.

This year alone, we’ve seen GPT-5.1, GPT-5.2, and soon GPT-5.4, with update cycles approaching once a month.

It seems OpenAI is being pushed into a corner by competitors.

Currently, ChatGPT has 910 million weekly active users, which sounds impressive but still falls short of OpenAI’s 1 billion WAU target.

Meanwhile, competitors like Google and Anthropic are closing the gap, continuously improving long context, agents, and reasoning.

If GPT-5.4 truly supports 1 million tokens, extreme reasoning, and persistent memory, AI could evolve into continuously working intelligent agents.

And if GPT-5.4 can actually "remember things," it might mark a pivotal moment in the history of large models.

The singularity is near, and we’re accelerating rapidly. Are you ready?

Source: Xinzhiyuan

Risk Warning and Disclaimer

Market risks are present; invest cautiously. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their circumstances. Invest at your own risk.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned