five

Replication package for the paper: "Datasets, Bias, Licenses, and Terms of Use: A Large and Longitudinal Study on the Documentation of Hugging Face Machine Learning Models"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15187256
下载链接
链接失效反馈
官方服务:
资源简介:
This replication package contains datasets related to the paper: "Datasets, Bias, Licenses, and Terms of Use: A Large and Longitudinal Study on the Documentation of Hugging Face Machine Learning Models" This replication package contains the new data used for the journal version of the manuscript, featuring: All data for the second snapshot (downloaded in September 2024) to answer RQ1, RQ2, and RQ3 Data from both snapshots related to terms of use (RQ4) Scripts and data from the first snapshot (April 2023) from the ICPC 2024 paper "How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study" are available at the following link: https://zenodo.org/records/10058142 Root directory Dataset Dataset/Dataset_HF_model_list.csv: list of HF models analyzed, with the following information: id,downloads,likes,tags,pipeline_tag,pipeline_category,License,license_model_permissivity Dataset/Dataset_GitHub_prj_list_Transformers.txt: list of GitHub projects using the transformers library Dataset/Dataset_GitHub_prj_list_Diffusers.txt: list of GitHub projects using the diffuserslibrary Dataset/Dataset_GitHub_prj_frompretrained_Transformers.txt: list of GitHub projects using the "from_pretrained" transformers library Dataset/Dataset_GitHub_prj_frompretrained_Diffusers.txt: list of GitHub projects using the "from_pretrained" diffusers library Dataset/Dataset_GitHub_prj_model_used_Transformers.csv: contains usage pairs: project, model for transformers library Dataset/Dataset_GitHub_prj_model_used_Diffusers.csv: contains usage pairs: project, model for diffusers library Dataset/Dataset_IntersectedModels.csv : contains the models shared between the first and second snapshot for category Dataset/modelsReadme: contains the model cards belonging to the sample size Dataset/projects_with_5_or_more_stars.csv: contains the projects with numStars major of 5 Dataset/projects_stars_summary.csv: contains the number of total projects with numStars RQ1 RQ1/RQ1_dataset_list_HF.txt: list of HF datasets RQ1/RQ1_datasetTags.txt: list of models declaring the dataset tag RQ1/RQ1_modelDataset.csv : list of models declaring the dataset tag with their respective datasets RQ1/RQ1_datasetSample.csv: sample set of models used for the manual analysis of datasets RQ2 RQ2/RQ2_bias_classification_sheet.csv: results of the manual labeling RQ3 RQ3/RQ3_License_Models.csv: model license list,categorized by permissiveness, with the respective number of occurrences RQ3/RQ3_License_prjTransformers.csv: transformers project license list, categorized by permissiveness, with the respective number of occurrences RQ3/RQ3_License_prjDiffusers.csv : diffusers project license list, categorized by permissiveness, with the respective number of occurrences RQ3/RQ3_prj_model_license_permissivity_Transformers_Diffusers.csv: total list of projects that reuse the models, with their respective licenses and permissiveness related to Transformers and Diffuserslibrary RQ3/RQ3_prj_model_license_permissivity_Transformers_Diffusers_Starmajor5.csv: total list of projects that reuse the models, with their respective licenses and permissiveness related to Transformers and Diffusers library for numStar > 5 RQ3/RQ3_Contingency_Matrix_permissivity_Transformers_Diffusers.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows) related to Transformers and Diffusers library in terms of permissiveness RQ3/RQ3_Contingency_Matrix_licenses_Transformers_Diffusers.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows) related to Transformers and Diffuserslibrary in terms of licenses RQ3/RQ3_Contingency_Matrix_permissivity_Transformers_Diffusers_Starmajor5.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows) related to Transformers and Diffusers library in terms of permissiveness for projects with numStar > 5 RQ3/RQ3_Contingency_Matrix_licenses_Transformers_Diffusers_Starmajor5.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows) related to Transformers and Diffusers library in terms of licenses for projects with numStar > 5 RQ4 RQ4/RQ4_Terms_of_Use_Snapshot1.csv: results of the manual labeling related to terms of use for the first snapshot RQ4/RQ4_Terms_of_Use_Snapshot2.csv: results of the manual labeling related to terms of use for the second snapshot
创建时间:
2025-04-10
二维码
社区交流群
二维码
科研交流群
商业服务