five

KGTORRENT

收藏
arXiv2021-03-19 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.4468522
下载链接
链接失效反馈
官方服务:
资源简介:
KGTORRENT是由意大利巴里大学创建的一个大型数据集,包含从Kaggle平台收集的248,761个Python Jupyter笔记本。该数据集不仅包括笔记本本身,还附带了丰富的元数据,这些元数据来自Meta Kaggle,一个关于Kaggle社区及其活动的每日更新数据集。KGTORRENT的创建过程涉及对Meta Kaggle数据的深入分析和逆向工程,以构建一个用于存储Kaggle元数据的关系数据库。该数据集主要用于研究数据科学家如何使用Jupyter Notebook,以及识别潜在的不足之处,从而指导未来Jupyter Notebook的扩展设计。

KGTORRENT is a large-scale dataset developed by the University of Bari in Italy, which houses 248,761 Python-based Jupyter Notebooks collected from the Kaggle platform. Beyond the standalone notebook files, the dataset also features comprehensive metadata sourced from Meta Kaggle—a daily-updated repository documenting the Kaggle community and its operational activities. The construction of KGTORRENT entails in-depth analysis and reverse engineering of Meta Kaggle data to establish a relational database for storing Kaggle-related metadata. This dataset is primarily intended for research investigating the usage patterns of Jupyter Notebooks among data scientists, as well as identifying potential shortcomings to inform the design of future extensible features for Jupyter Notebooks.
提供机构:
巴里大学
创建时间:
2021-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作