Sazuna/UAT_keywords
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Sazuna/UAT_keywords
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: uat_uri
list: string
- name: uat_label
list: string
- name: multihot
list: float64
- name: uat_uri_extended
list: string
- name: uat_label_extended
list: string
- name: multihot_extended
list: float32
splits:
- name: train
num_bytes: 1054742627
num_examples: 34025
download_size: 31954412
dataset_size: 1054742627
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: afl-3.0
task_categories:
- text-classification
language:
- en
tags:
- astronomy
- keywords
- scientific literature
- multi-label
size_categories:
- 10K<n<100K
---
# UAT Keywords
## Task description
Multi-label classification of the astrophysics literature for keywords discovery.
## Corpus constitution
Query on keywords through SciX ADS' API. To be accepted, a paper must present both a keyword with the index of the UAT and the label of the UAT.
## Version of UATs used to generate this dataset
[v6.0.0](https://github.com/astrothesaurus/UAT/releases/tag/v6.0.0)
## Columns description
- ```text```: raw text of title + abstract.
- ```uat_uri```: sorted URIs of the authors' assigned UATs.
- ```uat_label```: corresponding labels of the authors' assigned UATs.
- ```multihot```: multihot vectors of 2411 UATs generated from the uat_uri column and the label2idx dictionary (see attached file 'label_mapping.json').
- ```uat_uri_extended```: additional UATs URIs detected with string match.
- ```uat_label_extended```: additional UATs labels detected with string match.
- ```multihot_extended```: multihot vectors of 2411 UATs generated from the uat_uri + the uat_uri_extended columns and the label2idx dictionary (see attached file 'label_mapping.json').
## Related works
A dataset for the same task, using a previous UAT version, but with documents from more diverse sources:\
[https://huggingface.co/datasets/adsabs/SciX_UAT_keywords](https://huggingface.co/datasets/adsabs/SciX_UAT_keywords)
Model trained on the aforementioned dataset:\
[https://huggingface.co/adsabs/KAILAS](https://huggingface.co/adsabs/KAILAS)
提供机构:
Sazuna



