SBB/BK-Training-Dataset
收藏Hugging Face2025-09-02 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/SBB/BK-Training-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
这是一个多语言训练集,用于自动主题索引,包含来自Basisklassifikation(BK)系统的标题及其相应的主题(类别)。它包括超过600万个标题,主要使用德语、英语、法语、意大利语、俄语、拉丁语、西班牙语、阿拉伯语、波兰语和土耳其语。数据集以TSV格式组织,并旨在与Annif工具一起用于自动主题索引。它包括一个BK词汇表文件,用于模型训练。数据集由柏林国家图书馆的“Mensch.Maschine.Kultur”研究项目创建,并遵循CC BY 4.0许可。
This is a multilingual training set for automatic subject indexing, containing titles and their corresponding subjects (classes) from the Basisklassifikation (BK) system. It includes more than 6 million titles, primarily in German, English, French, Italian, Russian, Latin, Spanish, Arabic, Polish, and Turkish. The dataset is structured in TSV format and is intended to be used with the Annif tool for automatic subject indexing. It includes a vocabulary file of BK for model training. The dataset was created by the Mensch.Maschine.Kultur research project at the Berlin State Library and is licensed under CC BY 4.0.
提供机构:
SBB



