five

Dataset metadata of known Dataverse installations, August 2024

收藏
DataONE2025-08-09 更新2025-11-08 收录
下载链接:
https://search.dataone.org/view/sha256:b1e496d11d83c9dce8031d9d85e91ec6c2840336ac9351fa574814c38c47346c
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the metadata of the datasets published in 101 Dataverse installations, information about the metadata blocks of 106 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 88 installations that were running versions of the Dataverse software that include the \"multiple-license\" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and August 30, 2024 using a \"get_dataverse_installations_metadata\" function in a collection of Python functions at https://github.com/jggautier/dataverse-scripts/blob/main/dataverse_repository_curation_assistant/dataverse_repository_curation_assistant_functions.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named \"hostname\" listing each installation URL for which I was able to create an account and another column named \"apikey\" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens in order to use certain API endpoints. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │   ├── author_2024.08.25-2024.08.30.csv │   ├── contributor_2024.08.25-2024.08.30.csv │   ├── data_source_2024.08.25-2024.08.30.csv │   ├── ... │   └── topic_classification_2024.08.25-2024.08.30.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │   ├── Abacus_2024.08.26_15.52.42.zip │       ├── dataset_pids_Abacus_2024.08.26_15.52.42.csv │       ├── Dataverse_JSON_metadata_2024.08.26_15.52.42 │          ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │          ├── ... │       ├── metadatablocks_v5.9 │          ├── astrophysics_v5.9.json │          ├── biomedical_v5.9.json │          ├── citation_v5.9.json │          ├── ... │          ├── socialscience_v5.6.json │   ├── ACSS_Dataverse_2024.08.26_00.02.51.zip │   ├── ... │   └── Yale_Dataverse_2024.08.25_03.52.57.zip └── dataverse_installations_summary_2024.08.30.csv └── dataset_pids_from_most_known_dataverse_installations_2024.08.csv └── license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv └── metadatablocks_from_most_known_dataverse_installations_2024.08.30.csv This dataset contains two directories and four CSV files not in a directory. One directory, \"csv_files_with_metadata_from_most_known_dataverse_installations\", contains 20 CSV files that list the values of many of the metadata fields in the \"Citation\" metadata block and \"Geospatial\" metadata block of datasets in the 101 Dataverse installations. For example, author_2024.08.25-2024.08.30.csv contains the \"Author\" metadata for the latest versions of all published, non-deaccessioned datasets in 101 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, \"dataverse_json_metadata_from_each_known_dataverse_installation\", contains 106 zip files, one zip file for each of the 106 Dataverse installations whose sites were functioning when I attempted to collect their metadata. Each zip file contains a directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. For installations that had published datasets, and I was able to use Dataverse APIs to download the dataset metadata, the zip file also contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2024.08.30.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2024.08.csv file contains the dataset PIDs of published datasets in 101 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all \"dataset_pids_....csv\" files in each of the 101 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv file contains information about the licenses and...
创建时间:
2025-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作