five

marcelsun/wos_hierarchical_multi_label_text_classification

收藏
Hugging Face2024-12-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/marcelsun/wos_hierarchical_multi_label_text_classification
下载链接
链接失效反馈
官方服务:
资源简介:
WOS分层多标签文本分类数据集由du Toit和Dunaiski引入,基于Web of Science的标题和摘要数据创建,旨在通过采样和过滤方法创建在选定层次级别上具有良好平衡类别分布的数据集。该数据集包含三个变体:WOS_JT、WOS_CT和WOS_JTF,分别使用期刊分类、引用分类和经过过滤的期刊及引用分类作为标签。这些数据集用于分层、多标签的文本分类任务,适用于自然语言处理和大型语言模型的研究。

The WOS Hierarchical Multi-Label Text Classification dataset, introduced by du Toit and Dunaiski in 2024, is created from Web of Science title and abstract data, categorized into a hierarchical, multi-label class structure. The dataset includes three variants: WOS_JT (43,366 samples, using journal-based classifications only), WOS_CT (65,200 samples, using citation-based classifications only), and WOS_JTF (42,926 samples, using a filtered set based on journal and citation classifications). The dataset files include *.json (storing titles and abstracts with their associated class labels), depth2label.pt (storing the depth of classification hierarchy and associated class lists), path_list.pt (storing hierarchical relationships between classes), slot.pt and value2slot.pt (storing parent-child relationships between classes), and value_dict.pt (storing class IDs and their string representations).
提供机构:
marcelsun
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作