The Debsources Dataset: Two Decades of Free and Open Source Software
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/61089
下载链接
链接失效反馈官方服务:
资源简介:
This is the Debsources Dataset: source code and related metadata spanning two decades of Free and Open Source Software (FOSS) history, seen through the lens of the Debian distribution.
The dataset spans more than 3 billion lines of source code as well as metadata about them such as: size metrics (lines of code, disk usage), developer-defined symbols (ctags), file-level checksums (SHA1, SHA256, TLSH), file media types (MIME), release information (which version of which package containing which source code files has been released when), and license informa-
tion (GPL, BSD, etc).
The Debsources Dataset comes as a set of tarballs containing deduplicated unique source code files organized by their SHA1 checksums (the source code), plus a portable PostgreSQL database dump (the metadata).
The Debsources Dataset is described in full in the paper The Debsources Dataset: Two Decades of Free and Open Source Software, published on the Empirical Software Engineering journal with DOI 10.1007/s10664-016-9461-5 . A preprint of the paper is available at https://upsilon.cc/~zack/research/publications/debsources-ese-2016.pdf .
创建时间:
2020-01-24



