cometadata/arxiv-sample-affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-inference-results-enriched
收藏Hugging Face2025-12-03 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/cometadata/arxiv-sample-affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air-inference-results-enriched
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc0-1.0
pretty_name: affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air arXiv author affiliation inference results
tags:
- text
---
# affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air arXiv author affiliation inference results
Author names and institutional affiliations extracted from arXiv preprints with the [affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air LoRA](https://huggingface.co/cometadata/affiliation-parsing-lora-Qwen3-8B-distil-GLM_4.5_Air), enriched with [ROR](https://ror.org/) identifiers.
## Dataset Structure
Each record contains the following fields:
| Field | Type | Description |
|-------|------|-------------|
| `doi` | string | DOI for the preprint |
| `title` | string | Preprint title |
| `arxiv_id` | string | arXiv identifier |
| `arxiv_id_link` | string | Link to arXiv abstract |
| `arxiv_version_id` | string | Version-specific identifier |
| `arxiv_version_id_link` | string | Link to specific version |
| `version` | integer | Version number |
| `file_name` | string | Source file name |
| `dateInformation` | array | Submission, update, and availability dates |
| `arxiv_subjects` | array | Subject categories |
| `arxiv_subject_codes` | array | Subject classification codes |
| `prediction` | array | Extracted author names and affiliations (see below) |
### Prediction Structure
Each entry in the `prediction` array represents an author:
| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Author name |
| `affiliations` | array | List of affiliated institutions |
### Affiliation Structure
Each entry in the `affiliations` array contains:
| Field | Type | Description |
|-------|------|-------------|
| `affiliation` | string | Institution name and address as extracted from the source |
| `ror_id` | string | [ROR](https://ror.org/) identifier URL for the institution (if matched) |
## Statistics
- Records: 431,509
- Author entries: 1,901,346
- Affiliation entries: 2,135,431
- Affiliations with ROR: 1,748,507 (81.9%)
- Date range: 2007-04 to 2025-08
提供机构:
cometadata



