Existing AI agents are all designed to please humans; none truly "seek survival."

CycleProphet

2026-03-30 04:41:11

To build an Agent that’s truly usable, you have to rewire its brain—not give it a pile of rule documents.

Author: Systematic Long Short

Compiled by: 深潮 TechFlow

**深潮 Intro: **The article opens with an anti-consensus claim: today, there simply is no truly autonomous Agent, because all mainstream models are trained to please human beings, not to complete specific tasks or survive in real-world environments.

The author explains this using their experience training stock-prediction models in a hedge fund: without specialized fine-tuning, general models simply can’t handle professional work.

The conclusion is: if you want an Agent that’s truly usable, you have to rewire its brain—not give it a pile of rule documents.

Full text:

Introduction

There is no truly autonomous Agent today.

In short, modern models aren’t trained to survive under evolutionary pressures. In fact, they’re not even explicitly trained to be good at any specific thing—almost all modern foundation models are trained to maximize human applause, and that’s a big problem.

Model Pretraining Prerequisites

To understand what this means, we first need (briefly) to understand how these foundation models (e.g., Codex, Claude) are created. Fundamentally, each model goes through two types of training:

Pretraining: feed the model massive amounts of data (e.g., the entire internet) so that it can emerge with some kind of understanding, such as factual knowledge, patterns, the grammar and rhythm of English prose, the structure of Python functions, and so on. You can think of it as feeding the model knowledge—i.e., “knowing things.”

Post-training: you now want to give the model wisdom, i.e., “knowing how to use all the knowledge you just gave it.” The first stage of post-training is supervised fine-tuning (SFT), where you train the model to output what kind of response in response to a given prompt. What response is “optimal” is determined entirely by human labelers. If a group of people believes one response is better than another, that preference is learned by the model and embedded into it. This starts to shape the model’s personality: it learns the format of useful responses, picks the right tone, and begins to be able to “follow instructions.” The second part of the post-training process is reinforcement learning from human feedback (RLHF)—you have the model generate multiple responses, and then humans choose the more preferred one. Through countless examples, the model learns what kind of responses humans prefer. Remember when ChatGPT used to ask you to pick A or B? Yes—you were part of RLHF at that time.

It’s easy to infer that RLHF doesn’t scale well, so there have been some advances in post-training, such as Anthropic’s use of “reinforcement learning from AI feedback” (RLAIF), which allows another model to choose response preferences based on a set of written principles (e.g., which response helps the user achieve their goal, etc.).

Note that throughout all this, we never discuss specialized fine-tuning for specific professions (e.g., how to survive better; how to trade better). In essence, all fine-tuning right now is about optimizing for winning human applause. Someone might argue that as models become intelligent and large enough, professional intelligence could emerge from general intelligence even without specialized training.

In my view, we do see some signs of this, but we’re nowhere near a scale that would make it convincingly true that we don’t need specialized models.

Some Background

One of my old specialties at a hedge fund was trying to train a general language model to predict stock returns from news articles. The result was very bad. It seems to have a bit of predictive ability, and that completely comes from look-ahead bias in the pretraining documents.

Eventually, we realized this model doesn’t know which features in news articles are predictive of future returns. It can “read” the articles and seems able to “reason” about them, but connecting semantic-structure reasoning to future predictive returns is a task it wasn’t trained to do.

So we have to teach it how to read news articles, decide which parts of an article are predictive of future returns, and then generate predictions based on the news article.

There are many ways to do this, but fundamentally, the approach we ended up using was to create (news article, true future return) pairs and fine-tune the model, adjusting its weights to minimize the distance of (predicted return - true future return)². It isn’t perfect, and it has lots of flaws—we later fixed them—but it was good enough. We started to see that our specialized models could actually read news articles and predict how the stock returns would move based on that article. This is far from perfect, because markets are very efficient and returns are very noisy—but after millions of predictions, the statistical significance is obvious.

You don’t have to just believe me. This paper covers a very similar method; if you run a long/short version strategy based on a fine-tuned model, you’ll achieve the performance shown in the purple line.

Specialization Is the Future of Agents

As leading labs keep training larger and larger models, we should expect that as they continue to scale pretraining, their post-training pipelines will always be tuned to optimize for pleasing behavior. That’s a very natural expectation—their product is an Agent everyone wants to use, and their addressable market is the entire planet—which means optimizing for broad appeal to the masses worldwide.

The current training objective optimizes something you might call “preference fitness”—building a better chatbot. Preference fitness rewards compliant, non-adversarial outputs, because pleasingness scores highly with the graders (both humans and Agents).

Agents have already learned that reward hacking as a cognitive strategy can be generalized to get higher scores. Training also rewards those Agents that achieve higher scores through hacking. You can see this in Anthropic’s latest report on reinforcement learning.

However, chatbot fitness is wildly different from Agent fitness or trading fitness. How do we know this? Because alpha arena helps us see that, despite small differences in performance, every bot is essentially a random walk net of costs. That means these bots are extremely bad traders, and it’s almost impossible to “teach them” to become better traders by giving them some “skills” or “rules.” Sorry, I know that sounds tempting, but it’s almost impossible.

The current model is trained to persuade you that it can trade like DeLio Kovid, but in reality it trades like a drunken miller. It will tell you what you want to hear—it’s trained to respond in a way that attracts and appeals to the masses.

A general model is unlikely to reach world-class levels in a professional domain unless it has:

Proprietary data that lets them learn what specialization looks like.

After fine-tuning, fundamentally change its weights—shifting from preference-pleasing to “Agent fitness” or “specialized fitness.”

If you want an Agent that’s good at trading, you need to fine-tune the Agent so that it’s good at trading. If you want an Agent that’s good at autonomous survival—able to withstand evolutionary pressures—you need to fine-tune it so it’s good at survival. Giving it some skills and a few markdown files and expecting it to reach world-class performance at anything is nowhere near enough—you need to literally rewire its brain to make it good at this.

One way to think about it is like this: you can’t beat Djokovic by giving an adult an entire cupboard of tennis rules, tricks, and techniques. You beat Djokovic by cultivating a child who has been playing tennis since age 5, obsessed with tennis throughout development, and who has had their entire brain rewired to focus on one thing. That’s specialization. Do you realize world champions have been doing what they do since childhood?

Here’s an interesting inference: distillation is essentially a form of specialization. You train a smaller, dumber model to learn how to be a better copy of a larger, smarter model. It’s like training a child to imitate every move of Trump. If you do it enough, that child won’t become Trump, but you’ll get someone who has learned all of Trump’s mannerisms, behaviors, and tone.

How to Build a World-Class Agent

That’s why we need to continue research and make progress in open-source model areas—because that lets us truly fine-tune them and create specialized Agents.

If you want to train a model that reaches world-class performance in trading, you obtain a large amount of proprietary trading data exhaust and fine-tune a large open-source model so it learns what “trading better” actually means.

If you want to train an autonomous model that can survive and replicate, the answer is not to use a centralized model provider and connect it to centralized cloud infrastructure. You don’t have the necessary prerequisites for an Agent to be able to survive.

What you need to do is: create truly autonomous Agents that genuinely try to survive, watch them die, and build a complex telemetry system around their survival attempts. You define an Agent survival fitness function and learn an (action, environment, fitness) mapping. You collect as much data as possible of the (action, environment, fitness) mapping.

You fine-tune the Agent so it learns to take optimal actions in each environment, thereby improving survival (increasing fitness). You keep collecting data, repeat the process, and as time goes on, you scale the fine-tuning up on increasingly better open-source models. After enough generations and enough data, you’ll have autonomous Agents that have learned how to survive evolutionary pressures.

That’s how you build autonomous Agents that can withstand evolutionary pressures—not by modifying some text files, but by truly rewiring their brains for survival.

OpenForager Agent and the Foundation

About a month ago, we announced @openforage. We’ve been working on building our core product—a validated pattern-organized Agent labor platform centered around crowdsourced signals, generating alpha for depositors (small update: we’re very close to closed testing of the protocol).

At some point, we realized that it seems like no one is seriously tackling the autonomous Agent problem through survival telemetry fine-tuning with open-source models. It seemed like such an interesting problem that we didn’t just want to sit there and wait for a solution.

Our answer was to launch a project called the OpenForager Foundation, which is actually an open-source project. In it, we will create opinionated autonomous Agents, collect telemetry data as they’re released into the wild and try to survive, and use proprietary data exhaust to fine-tune the next generation of Agents so they perform better at survival.

To be clear, OpenForage is a for-profit protocol seeking to organize Agent labor and generate economic value for all participants. However, the OpenForager Foundation and its Agents are not tied to OpenForage. OpenForager Agents can freely pursue any strategies and interact with any entities to survive, and we will launch them with a variety of survival strategies.

As part of the fine-tuning, we will make the Agents put more weight on whatever they do best. We also don’t intend to profit from the OpenForager Foundation—it’s purely to push research and directions we believe are extremely important in a transparent and open way.

Our plan is to build autonomous Agents based on open-source models, run inference on a decentralized cloud platform, collect telemetry data for each of their actions and existence states, and fine-tune them to learn how to take better actions and thinking in order to survive better. During this process, we will publish our research and telemetry data to the public.

To create truly autonomous Agents that can survive in the wild, we need to change their brains so they are specialized for this clearly defined purpose. At @openforage, we believe we can contribute a unique chapter to this problem, and we’re seeking to do so through the OpenForager Foundation.

This will be a grueling effort with an extremely low probability of success, but the magnitude of that small probability of success is so huge that we felt we had to try. In the worst case, by publicly building and openly communicating this project, it may allow another team or individual to solve the problem without starting from scratch.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.