five

The Debsources Dataset: Two Decades of Free and Open Source Software

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/61089
下载链接
链接失效反馈
官方服务:
资源简介:
This is the Debsources Dataset: source code and related metadata spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distribution. The dataset spans more than 3 billion lines of source code as well as metadata about them such as: size metrics (lines of code, disk usage), developer-defined symbols (ctags), file-level checksums (SHA1, SHA256, TLSH), file media types (MIME), release information (which version of which package containing which source code files has been released when), and license informa- tion (GPL, BSD, etc). The Debsources Dataset comes as a set of tarballs containing deduplicated unique source code files organized by their SHA1 checksums (the source code), plus a portable PostgreSQL database dump (the metadata). The Debsources Dataset is described in full in the paper The Debsources Dataset: Two Decades of Free and Open Source Software, published on the Empirical Software Engineering journal with DOI 10.1007/s10664-016-9461-5 . A preprint of the paper is available at https://upsilon.cc/~zack/research/publications/debsources-ese-2016.pdf .
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作