GitHub recently announced that it will update its code repository policy starting on April 24, 2026, and plans to use user interaction data to train its AI models. This data collection covers Copilot Free, Pro, and Pro+ users, specifically including model inputs and outputs, code snippets, context information, repository structure, and chat interaction records.

Mario Rodriguez, GitHub's Chief Product Officer, stated that introducing interaction data aims to improve the accuracy and security of the model's code suggestions, and he mentioned that preliminary testing with Microsoft internal data has significantly increased the acceptance rate of suggestions. Notably, this policy uses a "default opt-in" mechanism, meaning affected users must manually go to privacy settings to disable the relevant option to opt out, which has sparked widespread discussions in the developer community about the definition of private repositories and data ownership.

Github

Currently, Copilot Business, Enterprise users, and education edition users are temporarily unaffected by this change due to contract terms. GitHub emphasized in its statement that this move aligns with industry practices followed by major companies such as Anthropic, JetBrains, and Microsoft. However, incorporating private repository code into the training set essentially challenges the boundaries of the traditional "private" concept, even though GitHub claims its purpose is to optimize the development workflow.