five

Code Contribution and Credit in Science

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/KPYVI1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains all data used and created as a part of the "Code Contribution and Credit in Science" article [TODO: link/doi to paper]. There are six files in this dataset: 1. rs-graph-v1-prod.db 2. rs-graph-v1-redacted.db 3. annotated-dev-author-em-resolved.csv 4. train-set.parquet 5. test-set.parquet 6. dev-author-em-misclassifications.csv rs-graph-v1-redacted.db The rs-graph-v1-redacted.db file is a SQLite database file that contains article-repository pairs. For each article, the basic bibliometric and author information is included. For each repository, only the basic repository metadata is included. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph rs-graph-v1-prod.db The rs-graph-v1-prod.db file is a SQLite database file that contains the same basic data as the rs-graph-v1-redacted.db database file but additionally includes the repository contributor information for each repository along with each contributor's details as well as our predicted linkages between article authors and repository developers. This database file has restricted access due to it's creation of linked personally identifiable information. For details as to how to load and access the data within this database, please review: https://github.com/evamaxfield/rs-graph annotated-dev-author-em-resolved.csv The annotated-dev-author-em-resolved.csv CSV file stores the annotations created by our team which were used to train our author-developer-account entity matching model. Like with the rs-graph-v1-prod.db, this data has restricted access due to it's creation of linked personally identifiable information. While the training data is kept private and available by request, we make the trained predictive model available at: https://github.com/evamaxfield/sci-soft-models The train-set.parquet and test-set.parquet were the exact splits used for model training. The dev-author-em-misclassifications.csv is the set of misclassifications from the model on the test-set.
创建时间:
2025-06-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作