Meta monitors employee computers training AI data leak, official urgently halts investigation

Meta launches the "Model Capability Initiative" in April 2026, recording mouse movements, clicks, and keystrokes on employees' computers in the United States to train their own AI models. The project was exposed after an internal security incident: an employee submitted a high-priority security report indicating that leaked data included complete prompts and transcripts, private conversations, personnel performance data, and internal data sensitivity ratings. Meta announced a suspension of the project to investigate, emphasizing that there is currently no evidence of improper data access by employees.
(Background summary: Former Meta executives' advice on AI era employment: a ten-year career plan is outdated; just focus on these two things)
(Additional context: The list of members of Peter Thiel's secret society Dialog was leaked, including Silicon Valley founders such as Musk, Trump's son-in-law, former Google CEO, and U.S. Treasury Secretary)

In April this year, Meta deployed a program on employees' computers in the U.S. The project is called "Model Capability Initiative (MCI)," aiming to collect mouse movements, click behaviors, and keyboard inputs to train Meta's own AI models.

Later, the incident was exposed because an employee submitted a SEV (high-priority security event report) after discovering a data leak.

This project is more serious than "employee monitoring"

There are two layers of issues with MCI. The first layer is the project itself: recording employees' computer activities to serve as training data for AI, which already pushes the privacy boundary. Reuters reported as early as May 2026 that the information collected by MCI exceeded the originally disclosed scope, and some data was stored unencrypted.

The second layer concerns the specific contents of the leak. According to the SEV report, what was leaked was not just ordinary operational logs, but: complete prompts and transcripts, private conversations, personnel and performance data, and internal DSS data sensitivity ratings (levels 1 to 4). All Meta employees can see this data.

In simple terms: data that should have been collected only by the system—employee operation logs—along with private conversations and performance ratings, were leaked and accessible within the company without any access restrictions. This is not just a "scope creep" issue but a failure in data governance from design to execution.

Meta issued a statement after the incident was exposed, saying they had carefully designed the project and added privacy protections, "currently there is no evidence that any employee improperly accessed the data," but they will suspend the project to investigate.

This is the next battleground for AI training data issues

The capability of AI models largely depends on the quality and diversity of training data.

Over the past few years, tech companies' data strategies have gone through several phases: the first was crawling publicly available internet data; the second was purchasing or licensing specific datasets; the third involved using user-generated interaction data during product use — which is what OpenAI's ChatGPT, Google’s various services, and others are doing; and now, a fourth source has emerged: employees' own work behaviors.

The logic behind MCI is not hard to understand. The daily activities of Meta's engineers, product managers, and designers on their computers represent high-quality, high-density human behavior data: what they are thinking, how they search, how they solve problems, how they communicate with colleagues. Such data has significant value for training AI assistants that can genuinely assist in work tasks.

The problem is that the ethical boundaries of this approach are extremely blurry. Is the data about employees' actions at work considered company assets, given the employment relationship?

Do employees truly have the "right to refuse," or is there de facto coercion? When data collection includes not only work efficiency metrics but also private conversations and performance ratings, maintaining clear boundaries becomes even more difficult.

From OpenAI being accused of scraping YouTube subtitles, to Adobe causing panic over their revised terms of service allowing "training AI on creator works," and Meta using employees' keystrokes as training data, the issue of AI training data has expanded from "public data copyright disputes" to a deeper ethical debate over "private behavioral data."

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments