Replication Package of the Paper "How do Papers Make into Machine Learning Frameworks: A Preliminary Study on TensorFlow"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14203683
下载链接
链接失效反馈官方服务:
资源简介:
This replication package contains datasets and scripts related to the paper: "How do Papers Make into Machine Learning Frameworks: A Preliminary Study on TensorFlow"
Contributor_Classification.csv: contains the assignment of each contributor to a specific classification. The file contains the following columns:
Date : contains the date of each comment
Type: describe the type of a pull request (if it is Close, Commit, DESCR, Merge, PC, RC)
ID : specific ID of the comment
Body : contains the body of the comment analyzed
Url : link at each comments
NumberPR : number of a PR
Name_Contributor: contains the name of a contributor for each comment
Contributor_classification: contains the assignment of a specific classification (academic, bot, ML expert, software engineer, unknown), obtained after manual analysis, for each contributor of a comment
Contributors_ManualAnalysis.csv: contains the manual analysis performed by two authors to assign a classification for each contributor. The file contains the following columns:
Contributors: contains the name of the contributor for each comment
Link GitHub: contains the link to the GitHub page for each contributor
# PR: contains the number of PRs in which a specific contributor is involved
Annotator1: manual classification of the first annotator
Annotator2: manual classification of the second annotator
Final Classification: contains the final label (academic, bot, ML expert, software engineer, unknown)assigned for each contributor
Organization: contains the organization, if any (Google, Hugging Face, Microsoft, OpenAI)
ManualAnalysis.csv: contains the manual analysis performed regarding Comment Type, Nature of Comment and Artifact. The .csv contains the following columns:
URL: contains the link to each comment
NumberPR : number of pull request
CommentType1: contains the classification of the first annotator in merit of comment type (Conventional review, Initial implementation, Management, ML review, Other)
CommentType2: contains the classification of the second annotator in merit of comment type (Conventional review, Initial implementation, Management, ML review, Other)
NatureComment1: contains the classification of the first annotator about the nature of the comment (Approval, Bug fix, Buid error, Clarification, Code, Code convention and spacing, Code review, Comment, Enhancement request, Explanation, Feedback, Introducing alternative implementation, Pinging, Plan for merging into TF, Question, References and referrals, Request a review, Request documentation improvement, Request test, Request verification, Review, Review assignment)
NatureComment2: contains the classification of the second annotator about the nature of the comment (Approval, Bug fix, Buid error, Clarification, Code, Code convention and spacing, Code review, Comment, Enhancement request, Explanation, Feedback, Introducing alternative implementation, Pinging, Plan for merging into TF, Question, References and referrals, Request a review, Request documentation improvement, Request test, Request verification, Review, Review assignment)
Artifact1: contains the classification of the first annotator with respect to the artifact (Article, Code, Commit, Issue/bug, Link, Review, Other)
Artifact2: contains the classification of the second annotator with respect to the artifact (Article, Code, Commit, Issue/bug, Link, Review, Other)
FINALCommentType: contains the final classification of the comment type after the resolution of the conflicts
FINALNatureComment: contains the final classification of the nature of the comment after resolution of conflicts
FINALArtifact: contains the final classification of the artifact after the resolution of conflicts
Summary_PR.csv: contains the details about the composition of each PRs. The file contains the following columns:
#PullRequest: contains the number of all pull requests analyzed
#events: contains the number of all the events analyzed for each PR
#comments: contains the number of all comments for each PR
#Commit:contains the number of Commit for each PR
PC: contains the number of PC for each PR
RC: contains the number of RC for each PR
Total_PR_Comments.csv: contains information about all comments analyzed. The columns are:
Date: contains the date of each comment
Type: describes the type of a pull request (if it is Close, Commit, DESCR, Merge, PC, RC)
ID: SHAn of the comment
NumberPR: PR number
Name_Contributor: contains the name of a contributor for each comment
Body: contains the body of the comment analyzed
Url: link at each comment
The replication also contains a directory results in which there are quantitative results. The directory contains:
Artifacts.csv: This file contains the results for the artifact. The columns are:
#PullRequest: number of the pull request analyzed
Article: contains the percentage of the occurrences of the article in the specific pull request
Code: contains the percentage of the occurrences of the code in the specific pull request
Commit: contains the percentage of the occurrences of the commit in the specific pull request
Issue reference: contains the percentage of the occurrences of the issue reference in the specific pull request
External link: contains the percentage of the occurrences of the external link in the specific pull request
Review: contains the percentage of the occurrences of the review in the specific pull request
Other: contains the percentage of the occurrences of the other in the specific pull request
ßAlso, the Mean_Value row contains the mean value of all occurrences for each column (Article, Code, Commit, Issue reference, External link, Review, Other)
CommentType.csv: this file contains the results for the comment type. The columns are:
#PullRequest:number of the pull request analyzed
Conventional review: contains the percentage of the occurrences of the conventional review in the specific pull request
Initial implementation: contains the percentage of the occurrences of the initial implementation in the specific pull request
Management: contains the percentage of the occurrences of the management in the specific pull request
ML review: contains the percentage of the occurrences of the ML review in the specific pull request
Other: contains the percentage of the occurrences of the Other in the specific pull request
Also, the Mean_Value row contains the mean value of all occurrences for each column (Conventional review, Initial implementation, Management, ML review, Other)
Contributor.csv: this file contains the results for the contributors. The columns are:
#PullRequest: number of the pull requests analyzed
academic: contains the percentage of the occurrences of the academic in the specific pull request
bot: contains the percentage of the occurrences of the bot in the specific pull request
ML expert: contains the percentage of the occurrences of the ML expert in the specific pull request
software engineer: contains the percentage of the occurrences of the software engineer in the specific pull request
unknown: contains the percentage of the occurrences of unknown in the specific pull request
Also, the Mean_Value row contains the mean value of all occurrences for each column (academic, bot, ML expert, software engineer, unknown)
创建时间:
2025-01-17



