PeaTMOSS
收藏arXiv2024-02-01 更新2024-06-21 收录
下载链接:
https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F
下载链接
链接失效反馈官方服务:
资源简介:
PeaTMOSS数据集由普渡大学等研究机构创建,包含281,638个预训练模型(PTMs)的元数据,以及28,575个使用这些模型的开源软件仓库。数据集不仅记录了模型的元数据,还包括了模型在下游应用中的使用情况。通过自动提取模型元数据,如训练数据集、参数和评估指标,增强了数据集的全面性。该数据集为研究PTM供应链提供了基础,有助于理解PTM的开发趋势和文档中的常见不足,以及在软件许可方面的不一致性。
The PeaTMOSS dataset was developed by research institutions including Purdue University. It houses metadata for 281,638 pre-trained models (PTMs) and 28,575 open-source software repositories that utilize these models. The dataset not only documents the metadata of these models but also captures their usage in downstream application scenarios. The comprehensiveness of the dataset is enhanced via the automatic extraction of key model metadata, including training datasets, model parameters, and evaluation metrics. This dataset serves as a foundational resource for research on the PTM supply chain, enabling insights into PTM development trends, common deficiencies in model documentation, and inconsistencies in software licensing.
提供机构:
普渡大学
创建时间:
2024-02-01



