Those small-town youths labeling AI large models

2026-04-07 12:52:11

Original author: Sleepy.md

Datong in Shanxi—once a city propped up by coal that held up half of the country’s fortunes—has now shaken off its coat of coal dust, picked up a sharper pickaxe, and slammed it heavily into another invisible mine.

Inside the office buildings at the Jinmao International Center in Pingcheng District, there are no more lift shafts and no more coal-transport trucks. In their place are thousands of computer workstations arranged tightly together. The Shanghai Runxun Cloud Zhong Shigu Data Smart Service Base occupies several entire floors; thousands of young employees wearing headsets are staring at their screens, clicking, dragging, and drawing boxes.

According to official data, as of November 2025, Datong has already commissioned 745k servers, brought in 69 call-labeling data companies, and helped more than 30k people secure nearby employment. Output value: 750 million yuan. In this numeric mine pit, 94% of practitioners have local household registration.

Not only Datong. Among the first batch of data-labeling bases confirmed by the National Data Bureau, county towns in the central and western regions—such as Yonghe County in Shanxi, Bijie in Guizhou, and Mengzi in Yunnan—are clearly on the list. In the data-labeling base in Yonghe County, 80% are female employees. Most of them are rural stay-at-home moms, or young people who returned home because they couldn’t find suitable work.

A century ago, in a textile mill in Manchester, England, farmers who had lost their land were packed in. But today, in front of computer screens in these remote county towns, young people sit who can’t find a place for themselves in the real economy.

They’re doing a piecework job that’s extremely futuristic yet brutally primitive—producing the data feed required by AI giants in Beijing, Shenzhen, and Silicon Valley to train large models.

Nobody thinks there’s anything wrong with that.

A new production line on the Loess Plateau

The essence of data labeling is teaching machines to recognize the world.

Autonomous driving needs to identify traffic lights and pedestrians, while large models need to distinguish what is a cat and what is a dog. Machines themselves have no common sense. First, humans must draw a box on the image and tell it, “This is a pedestrian.” Only after it has swallowed tens of millions of images can it learn to recognize things on its own.

This job doesn’t require a high education level—just patience, and a single index finger that can keep clicking without stopping.

In the golden age of 2017, a simple 2D box could cost more than a tenth of a yuan. Some companies even offered a high price of 0.5 yuan. Labelers with fast hands could work ten-plus hours a day and earn five or six hundred yuan. In county towns, this was definitely a high-paying, respectable job.

But as large models evolve, the brutal side of this production line has started to show itself.

By 2023, the per-item price for simple image labeling had already been slashed to 3 to 4 fen. That’s a drop of more than 90%. Even for 3D point cloud images—harder to label—where the image is made of dense points and must be magnified many times to see the edges clearly, labelers still have to draw a 3D box in three-dimensional space that includes length, width, height, and a rotation angle, to tightly, perfectly wrap around a vehicle or pedestrian. Even then, that complex 3D box is only worth 5 fen.

The direct consequence of this collapse in unit prices is a surge in labor intensity. To firmly cling to the monthly base salary of two or three thousand yuan, labelers have to keep, nonstop, improving their typing—no, their hand speed.

This is absolutely not a light, easy white-collar job. In many labeling bases, management is so strict it’s suffocating. You’re not allowed to take calls during work, and your phone must be locked in a storage compartment. The system precisely records each employee’s mouse trajectory and dwell time. If you stop for more than three minutes, backend warnings come like whips.

Even more unbearable is the error tolerance rate. The industry passing line is usually above 95%; some companies even require 98%–99%. That means if you draw 100 boxes and get 2 of them wrong, the entire image will be rejected for rework.

Because动态图 is frame-linked, when a vehicle changes lanes it gets occluded, and labelers have to use inference to find them one by one. In 3D point cloud images, if an object has more than 10 points, a box must be drawn. In a complex parking space project, if the line is drawn too long or you miss a label, quality inspection will always pick out the flaws. Having a single image sent back for rework four or five times is commonplace. In the end, after spending an hour, what you actually get is only a few fen.

A data labeler in Hunan posted her settlement slip on a social platform. After working for a day, she drew more than 700 boxes, with a unit price of 4 fen, for a total income of 30.2 yuan.

It’s a deeply split picture.

On one side are tech big shots in the spotlight at product launches, talking about how AGI will liberate humanity. On the other side are young people in county towns on the Loess Plateau and in the southwestern mountains, staring at screens for eight to ten hours every day—mechanically drawing boxes, hundreds of thousands, tens of thousands—sometimes even dreaming at night, with their fingers drawing lane lines in midair.

Someone once said that the outward appearance of AI is like a luxury car roaring past—but when you open the door, you find that inside, a hundred people are riding bicycles, clenching their teeth and pedaling as hard as they can.

Nobody thinks there’s anything wrong with that.

Piecework labor that teaches machines how to “love”

After the bottleneck of image recognition gets broken through, large models move into even deeper evolution. They need to learn to think, to talk, and even to display “empathy,” like humans.

This gives rise to the most core—and most expensive—stage in large model training: RLHF (reinforcement learning from human feedback).

Simply put, it means letting real people score AI-generated answers, telling it which answer is better and more aligned with human values and emotional preferences.

The reason ChatGPT looks “human” is that behind it, there are countless RLHF labelers giving it lessons.

On crowdsourcing platforms, these labeling tasks are often priced transparently: 3 to 7 yuan per item. Labelers need to give extremely subjective emotional ratings to the AI’s responses—judging whether the response is “warm,” whether it is “empathetic,” and whether it “takes the user’s emotions into account.”

A bottom-tier worker earning a monthly salary of two or three thousand yuan, scrambling in the grime of reality and even having no time to care about their own emotions—yet they are required to serve in the system as an emotional mentor to the AI and a judge of values.

They have to forcibly grind up those extremely complex and subtle human emotions—warmth, empathy—into cold, quantified scores of 1 to 5. If their scores differ from the system’s standard answers, they’ll be judged as not meeting the accuracy requirements, and their already meager piecework pay will be deducted.

This is a stripping of cognition. Human emotions—complex, delicate, full of ethics and compassion—are being forced into an algorithmic funnel. In the cold realm of quantification and standardized tick marks, they’re squeezed of even the last bit of warmth. When you marvel that the cyber-beast on the screen has learned to write poems and music, to ask after others’ health and comforts—when it even puts on the skin of melancholy—outside the screen, those once vivid human beings are, day after day, reduced by mechanical judgment into emotionless scoring machines.

This is the most hidden side of the entire industrial chain. It never appears in any financing news or technical white papers.

Nobody thinks there’s anything wrong with that.

985 master’s degrees and young people from small towns

Bottom-tier box-drawing work is being crushed by AI’s tracks. This cyber production line is starting to spread upward, beginning to consume even higher-level mental labor.

The appetite of large models has changed. It no longer satisfies itself with chewing up simple common knowledge; it needs to swallow human professional knowledge and high-level logic.

Recruitment platforms have started to frequently flash certain special part-time roles, such as “large model logic reasoning labeling” and “AI humanities training instructor.” The threshold for this kind of job is extremely high: often requiring “985/211 master’s degree or above,” covering professional fields like law, medicine, philosophy, and literature.

Many graduate students from prestigious schools are attracted and pour into these outsourced groups from major tech companies. But they quickly realize this isn’t really an easy mental workout—it’s a spiritual torment.

Before accepting official orders, they must read documents dozens of pages long covering scoring dimensions and evaluation standards, and complete two to three rounds of trial labeling. After passing, in official labeling, if the accuracy rate falls below the average level, they lose eligibility and get kicked out of the group chat.

Most suffocating of all is that these standards are not fixed at all. When faced with similar questions and answers, scoring them using the same way of thinking can still produce completely opposite results. It’s like taking an exam that can never be finished and has no standard answers whatsoever. You can’t raise your accuracy by mere self-effort or learning—you can only keep spinning in place, consuming mental and physical energy.

This is the new form of exploitation in the large-model era—class collapse.

Knowledge, once seen as a golden ladder for breaking barriers and climbing upward, has now sunk into being offered to algorithms as digital fodder that is more complex to chew. In the face of the absolute power of algorithms and systems, the 985 master’s degree holders in ivory towers and the young people in small towns on the Loess Plateau have reached the most bizarre convergence of different paths to the same end.

They fall together into this bottomless cyber mine pit, stripped of their halos, their differences leveled, and all turned into cheap gears on the tracks—gears that can be replaced at any time.

It’s the same abroad. In 2024, Apple directly cut an AI voice labeling team in Santiago of 121 people. These employees were responsible for improving Siri’s multilingual processing capabilities. They had thought they stood at the edge of a core business of a big company, only to instantly fall into the abyss of unemployment.

In the eyes of tech giants, whether it’s the box-drawing aunt in a county town or the logic training instructor who graduated from a top university—at bottom, they are all “consumables” that can be replaced at any time.

Nobody thinks there’s anything wrong with that.

A trillion-dollar Babel tower, built with blood sweat worth a few cents

According to data released by China’s CICT (China Information and Communications Technology Academy), in 2023, China’s data labeling market size reached 6.08 billion yuan; in 2025 it is expected to be 20 to 30 billion yuan. It’s forecast that by 2030, global data labeling and services market sales will surge to 117.1 billion yuan.

Behind these numbers is a valuation celebration in which tech giants like OpenAI, Microsoft, and ByteDance casually reach the scale of hundreds of billions to trillions of dollars.

But all this wealth poured down from heaven didn’t flow to the people who are truly “feeding” AI.

China’s data labeling industry has a typical inverted pyramid outsourcing structure. At the top are the tech giants that tightly hold the core algorithms. The second tier is large data service providers. The third tier is data labeling bases scattered across regions and small-to-medium outsourcing companies. Only at the bottom are those labelers who take piecework wages—the mud-legged workers.

Every layer of outsourcing has to brutally skim off a layer of profit. When the unit price paid by big companies is 0.5 yuan, after being chewed up through layer upon layer of stripping, what reaches the county-town labelers’ hands may be less than even 0.5 yuan.

In his book “Technological Feudalism,” Yanis Varoufakis, the former Greek finance minister, put forward a highly penetrating viewpoint: today’s tech giants are no longer capitalists in the traditional sense, but “Cloudalists.”

They don’t own factories and machines; they own algorithms, platforms, and computing power—these are digital territories in the cyber age. In this new feudal system, users are not consumers but digital serfs. Every like, comment, and browse on our social media is freely supplying data to the Cloudalists.

And the data labelers distributed in these down-tier markets are the lowliest class of digital serfs within this system. They not only have to produce data, they also have to clean, categorize, and score massive amounts of raw data—turning it into high-quality feed that large models can digest.

This is a covert land-grabbing campaign in cognition. Just like the enclosure movement in 19th-century Britain drove farmers into textile factories, today’s AI wave drives young people who can’t find a place in the real economy to sit in front of screens.

AI hasn’t bridged the class divide. Instead, it has built a “data and blood sweat conveyor belt” running from county towns in central and western China straight to the headquarters of big tech companies in Beijing, Shanghai, Guangzhou, Shenzhen, and beyond. The narrative of technological revolution is always grand and dazzling—but its underlying color is, forever, the large-scale consumption of cheap labor.

Nobody thinks there’s anything wrong with that.

No tomorrow that needs humans

The most brutal ending is coming soon—and it’s coming faster and faster.

As large model capabilities leap forward, labeling tasks that once required humans to work day and night are being taken over by AI itself.

In April 2023, Li Xiang, founder of Ideal Automobile, disclosed data on a forum. Previously, Ideal would need to manually label autonomous driving image data for roughly 10 million frames per year, with outsourcing costs close to 745k yuan. But after they used large models for automated labeling, what used to take a year could basically be completed in about three hours.

Efficiency is 1,000 times that of humans—and it was already in 2023. In the just-passed March, Ideal even released a new-generation MindVLA-o1 automatic labeling engine.

In the industry, there’s a self-deprecating line that’s extremely true: “The more intelligence there is, the more human labor there is.” But now, big companies’ investment in outsourcing data labeling has already shown a cliff-like drop of 40%–50%.

Those small-town young people who sat in front of computers for countless nights and days, burning their eyes until they turned red—personally fed a giant beast with their own hands. But now, this beast is turning around and smashing their meal tickets.

As night falls, office buildings in Pingcheng District of Datong are still as colorless and stark as day. Young people trading shifts silently exchange their tired bodies in the elevator lobby. In this folded space tightly imprisoned by countless polygon boxes, no one cares what kind of epic leap the Transformer architecture across the ocean is taking, and no one can make sense of the roaring computing power behind hundreds of billions of parameters.

Their eyes are only welded to that red-green progress bar in the backend that represents the “passing line,” calculating whether those few points and few fen in piecework numbers can be cobbled together into a decent life by the end of the month.

On one side is the Nasdaq bell-ringing and endless tech media coverage, with giants toasting to the arrival of AGI. On the other side are those digital serfs who fed AI one mouthful at a time with their flesh-and-blood, but they can only tremble and wait in painful dreams for that beast they personally raised—on some seemingly ordinary morning—to casually kick away their meal tickets.

Nobody thinks there’s anything wrong with that.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.