five

Replication Package of the Paper "How do Papers Make into Machine Learning Frameworks: A Preliminary Study on TensorFlow"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14203683
下载链接
链接失效反馈
官方服务:
资源简介:
This replication package contains datasets and scripts related to the paper: "How do Papers Make into Machine Learning Frameworks: A Preliminary Study on TensorFlow" Contributor_Classification.csv: contains the assignment of each contributor to a specific classification. The file contains the following columns: Date : contains the date of each comment Type: describe the type of a pull request (if it is Close, Commit, DESCR, Merge, PC, RC) ID : specific ID of the comment Body : contains the body of the comment analyzed Url : link at each comments NumberPR : number of a PR Name_Contributor: contains the name of a contributor for each comment Contributor_classification: contains the assignment of a specific classification (academic, bot, ML expert, software engineer, unknown), obtained after manual analysis, for each contributor of a comment Contributors_ManualAnalysis.csv: contains the manual analysis performed by two authors to assign a classification for each contributor. The file contains the following columns: Contributors: contains the name of the contributor for each comment Link GitHub: contains the link to the GitHub page for each contributor # PR: contains the number of PRs in which a specific contributor is involved Annotator1: manual classification of the first annotator Annotator2: manual classification of the second annotator Final Classification: contains the final label (academic, bot, ML expert, software engineer, unknown)assigned for each contributor Organization: contains the organization, if any (Google, Hugging Face, Microsoft, OpenAI) ManualAnalysis.csv: contains the manual analysis performed regarding Comment Type, Nature of Comment and Artifact. The .csv contains the following columns: URL: contains the link to each comment NumberPR : number of pull request CommentType1: contains the classification of the first annotator in merit of comment type (Conventional review, Initial implementation, Management, ML review, Other) CommentType2: contains the classification of the second annotator in merit of comment type (Conventional review, Initial implementation, Management, ML review, Other) NatureComment1: contains the classification of the first annotator about the nature of the comment (Approval, Bug fix, Buid error, Clarification, Code, Code convention and spacing, Code review, Comment, Enhancement request, Explanation, Feedback, Introducing alternative implementation, Pinging, Plan for merging into TF, Question, References and referrals, Request a review, Request documentation improvement, Request test, Request verification, Review, Review assignment) NatureComment2: contains the classification of the second annotator about the nature of the comment (Approval, Bug fix, Buid error, Clarification, Code, Code convention and spacing, Code review, Comment, Enhancement request, Explanation, Feedback, Introducing alternative implementation, Pinging, Plan for merging into TF, Question, References and referrals, Request a review, Request documentation improvement, Request test, Request verification, Review, Review assignment) Artifact1: contains the classification of the first annotator with respect to the artifact (Article, Code, Commit, Issue/bug, Link, Review, Other) Artifact2: contains the classification of the second annotator with respect to the artifact (Article, Code, Commit, Issue/bug, Link, Review, Other) FINALCommentType: contains the final classification of the comment type after the resolution of the conflicts FINALNatureComment: contains the final classification of the nature of the comment after resolution of conflicts FINALArtifact: contains the final classification of the artifact after the resolution of conflicts Summary_PR.csv: contains the details about the composition of each PRs. The file contains the following columns: #PullRequest: contains the number of all pull requests analyzed #events: contains the number of all the events analyzed for each PR #comments: contains the number of all comments for each PR #Commit:contains the number of Commit for each PR PC: contains the number of PC for each PR RC: contains the number of RC for each PR Total_PR_Comments.csv: contains information about all comments analyzed. The columns are: Date: contains the date of each comment Type: describes the type of a pull request (if it is Close, Commit, DESCR, Merge, PC, RC) ID: SHAn of the comment NumberPR: PR number Name_Contributor: contains the name of a contributor for each comment Body: contains the body of the comment analyzed Url: link at each comment The replication also contains a directory results in which there are quantitative results. The directory contains: Artifacts.csv: This file contains the results for the artifact. The columns are: #PullRequest: number of the pull request analyzed Article: contains the percentage of the occurrences of the article in the specific pull request Code: contains the percentage of the occurrences of the code in the specific pull request Commit: contains the percentage of the occurrences of the commit in the specific pull request Issue reference: contains the percentage of the occurrences of the issue reference in the specific pull request External link: contains the percentage of the occurrences of the external link in the specific pull request Review: contains the percentage of the occurrences of the review in the specific pull request Other: contains the percentage of the occurrences of the other in the specific pull request ßAlso, the Mean_Value row contains the mean value of all occurrences for each column (Article, Code, Commit, Issue reference, External link, Review, Other) CommentType.csv: this file contains the results for the comment type. The columns are: #PullRequest:number of the pull request analyzed Conventional review: contains the percentage of the occurrences of the conventional review in the specific pull request Initial implementation: contains the percentage of the occurrences of the initial implementation in the specific pull request Management: contains the percentage of the occurrences of the management in the specific pull request ML review: contains the percentage of the occurrences of the ML review in the specific pull request Other: contains the percentage of the occurrences of the Other in the specific pull request Also, the Mean_Value row contains the mean value of all occurrences for each column (Conventional review, Initial implementation, Management, ML review, Other) Contributor.csv: this file contains the results for the contributors. The columns are: #PullRequest: number of the pull requests analyzed academic: contains the percentage of the occurrences of the academic in the specific pull request bot: contains the percentage of the occurrences of the bot in the specific pull request ML expert: contains the percentage of the occurrences of the ML expert in the specific pull request software engineer: contains the percentage of the occurrences of the software engineer in the specific pull request unknown: contains the percentage of the occurrences of unknown in the specific pull request Also, the Mean_Value row contains the mean value of all occurrences for each column (academic, bot, ML expert, software engineer, unknown)
创建时间:
2025-01-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作