five

Sazuna/UAT_keywords

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Sazuna/UAT_keywords
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string - name: uat_uri list: string - name: uat_label list: string - name: multihot list: float64 - name: uat_uri_extended list: string - name: uat_label_extended list: string - name: multihot_extended list: float32 splits: - name: train num_bytes: 1054742627 num_examples: 34025 download_size: 31954412 dataset_size: 1054742627 configs: - config_name: default data_files: - split: train path: data/train-* license: afl-3.0 task_categories: - text-classification language: - en tags: - astronomy - keywords - scientific literature - multi-label size_categories: - 10K<n<100K --- # UAT Keywords ## Task description Multi-label classification of the astrophysics literature for keywords discovery. ## Corpus constitution Query on keywords through SciX ADS' API. To be accepted, a paper must present both a keyword with the index of the UAT and the label of the UAT. ## Version of UATs used to generate this dataset [v6.0.0](https://github.com/astrothesaurus/UAT/releases/tag/v6.0.0) ## Columns description - ```text```: raw text of title + abstract. - ```uat_uri```: sorted URIs of the authors' assigned UATs. - ```uat_label```: corresponding labels of the authors' assigned UATs. - ```multihot```: multihot vectors of 2411 UATs generated from the uat_uri column and the label2idx dictionary (see attached file 'label_mapping.json'). - ```uat_uri_extended```: additional UATs URIs detected with string match. - ```uat_label_extended```: additional UATs labels detected with string match. - ```multihot_extended```: multihot vectors of 2411 UATs generated from the uat_uri + the uat_uri_extended columns and the label2idx dictionary (see attached file 'label_mapping.json'). ## Related works A dataset for the same task, using a previous UAT version, but with documents from more diverse sources:\ [https://huggingface.co/datasets/adsabs/SciX_UAT_keywords](https://huggingface.co/datasets/adsabs/SciX_UAT_keywords) Model trained on the aforementioned dataset:\ [https://huggingface.co/adsabs/KAILAS](https://huggingface.co/adsabs/KAILAS)
提供机构:
Sazuna
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作