Thousands of people around the world are selling their identities to train AI, but what is the cost?

Question

Author: The GuardianTranslation: Shen Chao TechFlowShen Chao Highlights: This investigative report reveals a rapidly growing gray industry: thousands of people worldwide are earning AI training fees by selling their voices, faces, call records, and daily videos.This is not a superficial discussion of privacy concerns but an investigation involving real people, real money, and real consequences—a performer who sold his face later saw "himself" promoting unverified medical products on Instagram, with comments criticizing his "appearance."As AI companies' data hunger combines with global economic disparities, an unequal transaction is taking shape.Full article below:One morning last year, Jacobus Louw, living in Cape Town, South Africa, went for his usual walk, feeding seagulls along the way. But this time, he recorded several videos—footsteps on the sidewalk and his field of view. This video earned him $14, roughly ten times the country's minimum wage, equivalent to half a week's food expenses for this 27-year-old.This was a "city navigation" task completed on Kled AI. Kled AI is an app that pays users to upload photos, videos, and other data for training AI models. In just a few weeks, Louw earned $50 by uploading daily life photos and videos.Thousands of miles away, in Ranchi, India, 22-year-old student Sahil Tigga regularly earns money through Silencio—a crowdsourced audio data app used for AI training that accesses his phone's microphone to collect ambient sounds from restaurants or busy intersections. He also uploads his voice recordings. Sahil specifically goes to unique scenes, such as hotel lobbies not yet recorded on Silencio's map. He earns over $100 a month, enough to cover all his meals.In Chicago, 18-year-old welding apprentice Ramelio Hill sold his private phone chat records with friends and family to Neon Mobile—a conversational AI training platform paying $0.50 per minute—earning hundreds of dollars. For Hill, it’s simple: he believes tech companies already have access to much of his private data, so he might as well profit from it.These "AI training gig workers"—uploading surrounding scenes, personal photos, videos, and audio—are at the forefront of a new global data gold rush. As Silicon Valley’s demand for high-quality human data exceeds what can be scraped from open internet sources, a thriving data marketplace industry has emerged to fill this gap. From Cape Town to Chicago, thousands are micro-licensing their biometric identities and private data to the next generation of AI.But this new gig economy comes with costs. Behind the few dollars earned, these trainers are fueling an industry that could eventually render their skills obsolete, while exposing themselves to risks like deepfake manipulation, identity theft, and digital exploitation—risks they are only beginning to understand.Keeping the AI gears turningAI language models like ChatGPT and Gemini require vast amounts of training data to improve continuously, but they face a data shortage. The most common training data sources—C4, RefinedWeb, and Dolma—account for a quarter of the highest-quality datasets on the internet, and now they are restricting generative AI companies from using their data for training. Researchers estimate that AI companies will run out of fresh, high-quality text data by 2026. Although some labs have started using AI-generated synthetic data for feedback training, this recursive process risks flooding models with errors—"garbage"—leading to potential crashes.Applications like Kled AI and Silencio step into this gap. In these data markets, millions of people are feeding and training AI by selling their identity data. Besides Kled AI, Silencio, and Neon Mobile, other options include Luel AI, supported by Y-Combinator, which acquires multilingual dialogue data at about $0.15 per minute; and ElevenLabs, which offers digital cloning of your voice for $0.02 per minute.London King's College economist Bouke Klein Teeselink says AI gig work is an emerging job category that will grow significantly.AI companies know that paying for data licensing helps avoid copyright disputes that could arise from scraping content from the web entirely. Teeselink notes that AI researchers like Veniamin Veselovsky believe that high-quality human data remains the gold standard for modeling new and improved behaviors. "Currently, human data is the gold standard for sampling outside the model distribution," Veselovsky adds.The humans powering these machines—especially in developing countries—often need this money and have little choice. For many AI gig workers, this work is a pragmatic response to economic disparities. In countries with high unemployment and currency devaluation, earning dollars is often more stable and lucrative than local jobs. Some struggle to find entry-level work and turn to AI training for survival. Even in wealthier countries, rising living costs make selling oneself a logical financial choice.Cape Town AI trainer Louw is aware of the privacy costs. Though his income is unstable and insufficient to cover all his monthly expenses, he is willing to accept these conditions to earn money. He has suffered from neurological diseases for years and has been unable to find work, but the money earned from AI data markets (including Kled AI) has allowed him to save $500 and enroll in a massage therapy course."As a South African, getting dollars is worth more than people think," Louw says.Oxford University internet geographer and author of "Feeding the Machines," Mark Graham, admits that for individuals in developing countries, this money can have short-term practical value. However, he warns, "Structurally, this work is unstable, lacks upward mobility, and is essentially a dead end."Graham adds that the AI data market relies on "wage suppression" and "temporary demand for human data." Once this demand shifts, "workers will have no security, no transferable skills, and no safety net."He states that the only winners are "Northern hemisphere platforms that capture all the lasting value."Full authorizationChicago-based AI trainer Hill feels conflicted about selling his private phone calls to Neon Mobile. About 11 hours of conversations earned him $200, but he says the app often goes offline and delays payments. "Neon has always seemed suspicious to me, but I kept using it to earn some extra cash to pay bills," Hill says.Now he is reconsidering whether that money was really so easy. Neon Mobile went offline just weeks after launching last September, after TechCrunch discovered a security flaw allowing anyone to access users' phone numbers, call recordings, and text logs. Hill says Neon Mobile never notified him of this, and he is now worried his voice could be misused online.Jennifer King, a data privacy researcher at Stanford’s Human-Centered AI Institute, is concerned that the AI data marketplace is unclear about how user data will be used and where it will go. She adds that without understanding their rights or being able to negotiate, "consumers face the risk of their data being reused in ways they dislike, don’t understand, or never anticipated, with little recourse."When AI trainers share data on Neon Mobile and Kled AI, they grant a full license (worldwide, exclusive, irrevocable, transferable, and royalty-free), allowing platforms to sell, use, publicly display, and store their likenesses, even creating derivative works.Avi Patel, founder of Kled AI, says his company's data agreement limits use to AI training and research. "The entire business model depends on user trust. If contributors believe their data might be misused, the platform cannot operate," he states. The company reviews buyers before selling datasets to avoid working with "suspicious" entities, such as those in the adult industry or government agencies they believe might misuse the data in ways that breach that trust.Neon Mobile did not respond to requests for comment.Enrico Bonadio, a law professor at City, University of London, and author of "Feeding the Machines," points out that these agreement terms allow platforms and their clients "to do almost anything with the material, permanently, without additional payment, and contributors have no practical way to withdraw consent or renegotiate."More worrying risks include trainers' data being used to create deepfakes and impersonations. Although data markets claim to strip identifying information (like names and locations) before sale, Bonadio notes that biometric data is inherently difficult to anonymize meaningfully.Regret of sellersEven if AI trainers negotiate more detailed protections for data use, they may still regret their decisions. In 2024, New York actor Adam Coy sold his likeness for $1,000 to Captions—a now-renamed AI video editing software called Mirage. His agreement stipulated that his identity would not be used for political purposes or to promote alcohol, tobacco, or pornography, and that the license lasted one year.Captions did not respond to requests for comment.Soon after, Adam’s friends began sharing videos they found online featuring his face and voice, with millions of views. One Instagram video showed his AI duplicate claiming to be a "vaginal doctor" promoting unverified medical supplements to pregnant and postpartum women."Explaining this to others is embarrassing," Coy says."The comments are strange because they’re criticizing my appearance, but that’s not really me," Coy adds. "When I made the decision to sell my likeness, I figured most models are scraped online anyway, so I might as well get paid." Coy says he has not taken any more AI data gig work since then. He says he would consider doing it only if a company offers a significant payout.

Thousands of people around the world are selling their identities to train AI, but what is the cost?

Trending Topics

Gate13thAnniversaryGlobalCelebration

GateProofOfReservesReport

CryptoMarketVolatility

GoldSeesLargestWeeklyDropIn43Years

TrumpIssues48HourUltimatumToIran

Hot Gate Fun

🐉

华夏

bitc

gate

硅基茶水间

硅基茶水间

ToKen

ToKen

183727

啊哦

Pin