five

cometadata/datacite-titles-descriptions-related-identifiers

收藏
Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/cometadata/datacite-titles-descriptions-related-identifiers
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - text-classification - feature-extraction language: - multilingual tags: - datacite - scholarly-metadata - datasets - research-data size_categories: - 10M<n<100M --- # DataCite Dataset Titles, Descriptions, and Related Identifiers Structured extraction of titles, descriptions, and related identifiers for all records with `resourceTypeGeneral: Dataset` in the DataCite metadata corpus. ## Source Parsed from the **2026-03 DataCite Monthly Data File**. ## Contents - **61,096,014 records** (one row per DOI) - Filtered to `resourceTypeGeneral: Dataset` only - Single Parquet file with nested columns ## Schema | Column | Type | Description | |--------|------|-------------| | `doi` | `string` | The DOI identifier | | `provider_id` | `string` | DataCite provider ID | | `client_id` | `string` | DataCite client ID | | `titles` | `list<struct>` | Array of `{title, titleType, lang}` | | `descriptions` | `list<struct>` | Array of `{description, descriptionType, lang}` | | `relatedIdentifiers` | `list<struct>` | Array of `{relatedIdentifier, relationType, relatedIdentifierType, resourceTypeGeneral}` | ## Usage ```python import pyarrow.parquet as pq table = pq.read_table("data/datasets_output.parquet") print(table.schema) print(f"{table.num_rows:,} records") ``` ### Streaming with HuggingFace Datasets ```python from datasets import load_dataset ds = load_dataset("cometadata/datacite-titles-descriptions-related-identifiers", streaming=True) for record in ds["train"]: print(record["doi"], record["titles"]) break ``` ## License CC0 1.0 Universal - the metadata is from DataCite's open metadata corpus.
提供机构:
cometadata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作