five

Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects — Supplementary Material

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/1148069
下载链接
链接失效反馈
官方服务:
资源简介:
Background: Stack Overflow (SO) is the largest Q&A website for software developers, providing a huge amount of copyable code snippets. Using those snippets raises various maintenance and legal issues. SO’s license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO’s license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. Aim: Our main goal was to analyze how often code from SO posts is used in public GitHub projects, but not attributed as required by the license. Further, we wanted to investigate if developers are aware of SO’s license and its implications, and to what degree they adhere to the attribution requirements defined in SO’s terms of service. Method: We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. Results: For the different sets of projects that we analyzed, the amount of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required, i.e., using a link in a source code comment. About half of the surveyed developers admitted copying code from SO without attribution. Furthermore, about two thirds of them were not aware of the license of SO code snippets and its implications.
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作