five

GitSED: GitHub Socially Enhanced Dataset

收藏
Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/5021329
下载链接
链接失效反馈
官方服务:
资源简介:
Software Engineering has evolved as a field to study not only the many ways software is created but also how it evolves, becomes successful, is effective and efficient in its objectives, satisfies its quality attributes, and much more. Nonetheless, there are still many open issues during its conception, development, and maintenance phases. Especially, understanding how developers collaborate may help in all such phases, but it is also challenging. Luckily, we may now explore a novel angle to deal with such a challenge: studying the social aspects of software development over social networks. With GitHub becoming the main representative of collaborative software development online tools, there are approaches to assess the follow-network, stargazer-network, and contributors-network. Moreover, having such networks built from real software projects offers support for relevant applications, such as detection of key developers, recommendation of collaboration among developers, detection of developer communities, and analyses of collaboration patterns in agile development. GitSED is a dataset based on GitHub that is curated (cleaned and reduced), augmented with external data, and enriched with social information on developers’ interactions. The original data is extracted from GHTorrent (an offline repository of data collected through the GitHub REST API). Our final dataset contains data from up to June 2019. It comprises: 8,556,778 repositories 32,411,674 developers 6 programming languages (Assembly, JavaScript, Pascal, Python, Ruby, Visual Basic) 13 collaboration metrics There are two previous versions of GitSED, which were originally built for the following conference papers: v2 (May 2017): Gabriel P. Oliveira, Natércia A. Batista, Michele A. Brandão, and Mirella M. Moro. Tie Strength in GitHub Heterogeneous Networks. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web (WebMedia'18), 2018. v1 (Sep 2015): Natércia A. Batista, Michele A. Brandão, Gabriela B. Alves, Ana Paula Couto da Silva, and Mirella M. Moro. Collaboration strength metrics and analyses on GitHub. In Proceedings of the International Conference on Web Intelligence (WI'17), 2017.

软件工程作为一门学科,其研究范畴不仅涵盖软件开发的各类路径,还包括软件的演化历程、成功要素、目标达成的有效性与效率、质量属性的满足情况等诸多议题。然而,在软件的构思、开发与维护全生命周期阶段,仍存在大量未解决的开放性问题。其中,明晰开发者的协作模式可为上述所有阶段提供关键支撑,但这一任务本身亦颇具挑战性。幸运的是,如今我们可探索一种全新的研究思路以应对这一挑战:基于社交网络探析软件开发的社会属性。随着GitHub成为在线协作开发工具的主流代表,学界已提出多种针对关注者网络、星标者网络与贡献者网络的评估方法。此外,基于真实软件项目构建的此类网络,可为诸多相关应用提供支撑,例如关键开发者识别、开发者间协作推荐、开发者社区检测以及敏捷开发中的协作模式分析等。GitSED是一款基于GitHub构建的数据集,经过了整理(清洗与精简)、外部数据增补,并补充了开发者交互的社交信息。其原始数据提取自GHTorrent——一个通过GitHub 表述性状态转移应用程序编程接口(REST API)收集数据的离线数据仓库。本最终数据集的时间范围截至2019年6月,包含以下内容:8,556,778个代码仓库、32,411,674名开发者、6种编程语言(汇编语言(Assembly)、JavaScript、帕斯卡语言(Pascal)、Python、Ruby、Visual Basic)以及13项协作度量指标。GitSED此前共有两个版本,最初分别用于以下两篇会议论文:v2(2017年5月):Gabriel P. Oliveira、Natércia A. Batista、Michele A. Brandão与Mirella M. Moro。《GitHub异构网络中的纽带强度》(Tie Strength in GitHub Heterogeneous Networks),发表于第24届巴西多媒体与Web研讨会(WebMedia'18)会议论文集,2018年。v1(2015年9月):Natércia A. Batista、Michele A. Brandão、Gabriela B. Alves、Ana Paula Couto da Silva与Mirella M. Moro。《GitHub上的协作强度度量与分析》(Collaboration strength metrics and analyses on GitHub),发表于国际Web智能会议(WI'17)会议论文集,2017年。
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作