five

cometadata/arxiv-sample-affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-inference-results-enriched

收藏
Hugging Face2025-12-03 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/cometadata/arxiv-sample-affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-inference-results-enriched
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc0-1.0 pretty_name: affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air arXiv author affiliation inference results tags: - text --- # affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air arXiv author affiliation inference results Author names and institutional affiliations extracted from arXiv preprints with the [affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air LoRA](https://huggingface.co/cometadata/affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air), enriched with [ROR](https://ror.org/) identifiers. ## Dataset Structure Each record contains the following fields: | Field | Type | Description | |-------|------|-------------| | `doi` | string | DOI for the preprint | | `title` | string | Preprint title | | `arxiv_id` | string | arXiv identifier | | `arxiv_id_link` | string | Link to arXiv abstract | | `arxiv_version_id` | string | Version-specific identifier | | `arxiv_version_id_link` | string | Link to specific version | | `version` | integer | Version number | | `file_name` | string | Source file name | | `dateInformation` | array | Submission, update, and availability dates | | `arxiv_subjects` | array | Subject categories | | `arxiv_subject_codes` | array | Subject classification codes | | `prediction` | array | Extracted author names and affiliations (see below) | ### Prediction Structure Each entry in the `prediction` array represents an author: | Field | Type | Description | |-------|------|-------------| | `name` | string | Author name | | `affiliations` | array | List of affiliated institutions | ### Affiliation Structure Each entry in the `affiliations` array contains: | Field | Type | Description | |-------|------|-------------| | `affiliation` | string | Institution name and address as extracted from the source | | `ror_id` | string | [ROR](https://ror.org/) identifier URL for the institution (if matched) | ## Statistics - Records: 431,509 - Author entries: 1,901,346 - Affiliation entries: 2,135,431 - Affiliations with ROR: 1,748,507 (81.9%) - Date range: 2007-04 to 2025-08
提供机构:
cometadata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作