Czech Text Document Corpus v 2.0
收藏arXiv2018-01-31 更新2024-06-21 收录
下载链接:
http://ctdc.kiv.zcu.cz/
下载链接
链接失效反馈官方服务:
资源简介:
Czech Text Document Corpus v 2.0是由西波西米亚大学创建的文本数据集,包含11,955篇由捷克新闻社提供的真实报纸文章。该数据集旨在支持捷克语文档的自动分类研究,特别适用于多标签文档分类方法的评估,因为每篇文章通常与多个标签相关联。数据集不仅包含文档分类信息,还进行了形态学层面的自动标注。此外,数据集的应用领域广泛,主要用于评估和比较不同的文档分类技术,以解决信息组织和存储中的实际问题。
Czech Text Document Corpus v 2.0 is a text dataset created by the University of West Bohemia, which consists of 11,955 authentic newspaper articles provided by Czech news agencies. The dataset is intended to support research on automatic classification of Czech-language documents, and is especially applicable for evaluating multi-label document classification methods, since each article is typically linked to multiple labels. Besides including document classification-related information, the corpus has also undergone automatic morphological annotation. Furthermore, the dataset has broad application scopes, and is mainly employed to evaluate and compare diverse document classification technologies to tackle practical issues in information organization and storage.
提供机构:
西波西米亚大学
创建时间:
2017-10-06



