five

Dataset metadata of known Dataverse installations, August 2025

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/RMAGSH
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the metadata of the datasets published in 118 Dataverse installations, information about the metadata blocks of 118 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 100 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and September 2, 2025 using a Python script that uses the Dataverse API. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 118 Dataverse installations. For example, author(citation)_2025.08.25-2025.09.02.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 118 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 118 zip files, one zip file for each of the 118 Dataverse installations whose sites were functioning when I attempted to collect their metadata and that have at least one published dataset. Each zip file contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2025.09.02.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv file contains the dataset PIDs of published datasets in 118 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 118 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv file contains information about the licenses and data use agreements that some installations let depositors choose when creating datasets. When I collected this data, 100 of the available 118 installations were running versions of the Dataverse software that allow depositors to choose a "predefined license or data use agreement" from a dropdown menu in the dataset deposit form. For more information about this Dataverse feature, see https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv file contains the metadata block names, field names, child field names (if the field is a compound field), display names, descriptions/tooltip text, watermarks, and controlled vocabulary values of fields in the 118 Dataverse installations' metadata blocks. This file is useful for learning...

本数据集涵盖118个Dataverse实例中已发布数据集的元数据、118个实例的元数据块信息,以及100个运行了支持"多许可"功能的Dataverse软件版本的实例中,投稿者可为数据集申请的预定义许可或数据集条款列表。该数据集有助于深化对特定Dataverse功能与元数据字段使用方式的认知,同时可用于探究不同Dataverse实例内部及实例间的数据集与文件级元数据质量。 元数据获取方式:本数据集于2025年8月25日至9月2日期间,通过调用Dataverse API的Python脚本从各实例下载了数据集元数据与元数据块JSON文件。 文件组织结构如下: ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv 本数据集包含2个目录与4个未归入目录的CSV文件。其中第一个目录为"csv_files_with_metadata_from_most_known_dataverse_installations",内有20个CSV文件,分别记录118个Dataverse实例中数据集的"Citation"(引用)元数据块与"Geospatial"(地理空间)元数据块内多数字段的取值。例如,author(citation)_2025.08.25-2025.09.02.csv收录了118个实例中所有已发布且未被撤销的数据集最新版本的"Author"(作者)元数据,其包含四张子字段对应的列:作者姓名、所属机构、标识符类型与标识符。 第二个目录为"dataverse_json_metadata_from_each_known_dataverse_installation",内有118个压缩包,分别对应118个在元数据采集期间站点正常运行且至少包含一个已发布数据集的Dataverse实例。每个压缩包均包含以下内容: 1. 一份CSV文件,记录该实例中已发布数据集的相关信息,其中包含一列用于标注Python脚本是否成功下载该数据集的Dataverse JSON元数据; 2. 一个目录,内含记录该实例元数据字段信息的JSON文件,例如字段名称与组织结构; 3. 一个目录,内含遵循Dataverse JSON元数据架构的、该实例已发布且未被撤销的数据集版本的元数据JSON文件。 "dataverse_installations_summary_2025.09.02.csv"文件收录了各实例的相关信息,包括实例名称、访问URL、Dataverse软件版本,以及本数据集收录与未收录的数据集元数据数量。 "dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv"文件收录了118个Dataverse实例中已发布数据集的数据集PID(永久标识符),其中包含一列用于标注Python脚本是否成功下载该数据集元数据的列。该文件是"dataverse_json_metadata_from_each_known_dataverse_installation"目录下118个压缩包中所有"dataset_pids_....csv"文件的并集。 "license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv"文件收录了部分实例允许投稿者在创建数据集时选择的许可与数据使用协议相关信息。本次数据采集期间,118个可用实例中有100个运行的Dataverse软件版本支持投稿者在数据集投稿表单的下拉菜单中选择"预定义许可或数据使用协议"。如需了解该Dataverse功能的更多信息,请访问:https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license。 "metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv"文件收录了118个Dataverse实例元数据块的相关信息,包括元数据块名称、字段名称、子字段名称(若字段为复合字段)、显示名称、描述/提示文本、水印以及字段的受控词表取值。该文件可用于学习……
创建时间:
2025-09-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作