GitHub Profiles (users/organisations) and Repositories (research/non-research) of Potsdam Researchers and Research Organisations: An annotated dataset of with howfairis and software quality variables.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12607762
下载链接
链接失效反馈官方服务:
资源简介:
This dataset accompanies the paper "Software FAIRness, Documentation and Development Practices in Potsdam Researchers' GitHub Repositories" It includes 3 CSV files that contain data related to github profiles of users/organisations, their repositories annotated as research/non-research repositories and followed by FAIRness and other software qualtiy variables. The data were collected using SWORDS-template-UP (v1.0.0) methods (collect_users, collect_repositories, collect_variables) which is extended version of SWORS-template adopted according our needs and detailed in the paper.
GitHub (research) user/organisation profiles. ( github_profiles.csv )
Column name
Description
user_id
GitHub username
html_url
URL of the GitHub profile
type
Type of profile (user or organization)
organisation
Acronym or name of the organization
GitHub repositories (github_repositories.csv)
This file contains the repositories scraped from the GitHub profiles of research users and organizations.
Column name
Description
html_url
URL link to the repository
description
GitHub project description
project
Specifies if the project is research or non-research
language
Programming language used in the project
organisation
Acronym or name of the university, institution, or research organization
research_group
Acronym or name of the research group the repository belongs to
Research repositories filtered and annotated (github_research_repositories_filtered_annotated.csv)
This file contains filtered and annotated information about research repositories.
Column Name
Description
Collection Method
html_url
Repository URL
howfairis_repository
Indicates if the repository is public or private (True/False)
(Script- howfairis_variable.py) is a wrapper for howfairis pypi library that checks the 5 recommendations of FAIR
howfairis_license
Indicates if the repository has a license (True/False)
(Script- howfairis_variable.py) is a wrapper for howfairis pypi library that checks the 5 recommendations of FAIR
howfairis_registry
Indicates if the repository has implemented community registry (True/False)
(Script- howfairis_variable.py) is a wrapper for howfairis pypi library that checks the 5 recommendations of FAIR
howfairis_citation
Indicates if the repository has a .cff file (True/False)
(Script- howfairis_variable.py) is a wrapper for howfairis pypi library that checks the 5 recommendations of FAIR
howfairis_checklist
Indicates if the repository has implemented OpenSSF best practices badge (True/False)
(Script- howfairis_variable.py) is a wrapper for howfairis pypi library that checks the 5 recommendations of FAIR
fair_score
Score based on howfairis variables (0-5)
dlr_soft_class
Name of the university, company, research institute, or research organization
(Manual) Annotated the repository based on DLR software engineering guideline. There are no specific definitions on metrics how to categorise them (github repositories) into application classes. Which were needed to do a comparitive analysis.
installation_instruction
Presence of installation instruction (True/False)
(Manual) Checked the presense of Installation Instruction in the readme or in the project wiki pages.
project_information
Presence of basic project information in README (True/False)
(Manual) Checked if the readme have basic information about the project.
usage_guide
Presence of folder named test/tests in the root directory (True/False)
(Manual) Checked the presense of Usage Guide in the readme or in the project wiki pages. For command line tools checked if they have help command which guides how to use the tool.
test_folder
Presence of folder named test/tests in the root directory (True/False)
(Script - test_folder.py) Checks the folder names test/tests in the root directory of the repository.
requirements_explicit
Explicit requirements for Python, R, C++ repositories (True/False)
(Script - requirement_explicit.py) Checks the files (requirements.txt, DESCRIPTION, CMakeLists.txt) in the root directory.
continuous_integration
Indicates if the repository uses continuous integration (True/False)
(Script- continious_integration.py) Checks the presence of folder .github (github actions) same for other continious integration (travisCI, CircleCI, Jekins, azure pipeline)
ci_tool
Name of the continuous integration tool used
(Script- continious_integration.py) Checks the presence of folder .github (github actions) same for other continious integration (travisCI, CircleCI, Jekins, azure pipeline)
add_lint_rule
Indicates if additional linting rules are present (True/False)
(Script - add_ci_rules.py) - it scans the YAML files in the .github/workflows directory to detect the presence of (linters) Python, R, and C++.
add_test_rule
Indicates if additional testing rules are present (True/False)
(Script - add_ci_rules.py) - it scans the YAML files in the .github/workflows directory to detect the presence of (testing libraries) Python, R, and C++.
comment_at_start
Indicates the level of comments at the start of the program (most, more, some, less)
(Script - comment_at_start.py) Checks the presence of brief comments at the start at source code files in GitHub repositories.
language
Programming language used in the repository
type
Specifies if the profile is a user or organization
Github organisation or user profiles.
organisation
Name of the university, company, research institute, or research organization
Oraganisation name (from where the user was found)
research_group
Name or acronym of the research group
Data for publication - https://github.com/Software-Engineering-Group-UP/potsdam-research-repos
创建时间:
2024-08-07



