five

LSHTC

收藏
arXiv2015-03-30 更新2024-06-21 收录
下载链接:
http://lshtc.iit.demokritos.gr/
下载链接
链接失效反馈
官方服务:
资源简介:
LSHTC数据集是由LSHTC系列挑战赛发布的一系列大规模文本分类数据集,旨在评估分类系统在大规模分类任务中的性能,涉及的类别数量可达数十万。数据集来源于DBpedia和ODP(Open Directory Project),包括内容向量和描述向量。创建过程中,数据集被分为训练、验证和测试集,且每个实例都映射到唯一的数字。LSHTC数据集主要应用于解决大规模文本分类问题,特别是在处理复杂类别关系和数据稀疏性问题时。

The LSHTC datasets are a set of large-scale text classification datasets released by the LSHTC series of challenges, intended to evaluate the performance of classification systems in large-scale classification tasks, with the number of categories potentially reaching hundreds of thousands. The datasets are sourced from DBpedia and ODP (Open Directory Project), and contain both content vectors and description vectors. During the curation process, the datasets are partitioned into training, validation, and test subsets, with each instance mapped to a unique numerical ID. The LSHTC datasets are primarily utilized to address large-scale text classification problems, particularly when handling complex category relationships and data sparsity issues.
提供机构:
LSHTC
创建时间:
2015-03-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作