GitHub announces that starting April 24, it will default to using Copilot user data to train AI models

MarsBitNews · 2026-03-26T01:52:11+00:00

GitHub will update its repository policies starting in 2026 to train AI models using user interaction data, sparking developer discussions about private repositories and data ownership. This move aims to improve code suggestion accuracy, signaling GitHub's transition toward a closed-loop AI training ecosystem and reflecting the industry's emphasis on private data utilization.

MarsBitNews

2026-03-26 01:52:11

Abstract generation in progress

GitHub recently announced that starting April 24, 2026, it will update its code repository policy to utilize user interaction data for training its AI models. The data collection will include users of Copilot Free, Pro, and Pro+ and will cover model input/output, code snippets, contextual information, repository structure, and chat interaction records.

GitHub’s Chief Product Officer Mario Rodriguez stated that incorporating interaction data aims to improve the accuracy and safety of code suggestions. He also mentioned that pre-testing with internal Microsoft data has significantly increased suggestion acceptance rates. Notably, the policy adopts a “pre-set opt-in” mechanism, meaning affected users must manually go into privacy settings to disable the relevant options to opt out. This has sparked widespread discussion within the developer community regarding the definition of private repositories and data rights.

Currently, Copilot Business, Enterprise users bound by contractual terms, and educational users are not affected by this change. GitHub emphasized that this move aligns with industry practices common among major companies like Anthropic, JetBrains, and Microsoft. However, including private repository code in training datasets essentially challenges traditional notions of “privacy,” even though GitHub claims its goal is to optimize development workflows.

From an industry perspective, as high-quality public domain code data becomes scarce, leading AI companies are accelerating efforts to mine private interaction data and other “deep data” to seek performance gains. This policy shift not only signifies GitHub’s move from an open-source hosting platform toward a closed-loop AI training ecosystem but also indicates that the field of AI developer tools is entering a new stage of data compliance and model evolution competition.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.