five

xwjzds/ag_news_lemma_train

收藏
Hugging Face2023-09-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xwjzds/ag_news_lemma_train
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Dataset Name ### Dataset Summary This is lemmatized version of Ag News Data. ### Languages English ### Citation Information ```bibtex @inproceedings{xu-etal-2023-vontss, title = "v{ONTSS}: v{MF} based semi-supervised neural topic modeling with optimal transport", author = "Xu, Weijie and Jiang, Xiaoyu and Sengamedu Hanumantha Rao, Srinivasan and Iannacci, Francis and Zhao, Jinjin", booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-acl.271", doi = "10.18653/v1/2023.findings-acl.271", pages = "4433--4457", abstract = "Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.", } ```
提供机构:
xwjzds
原始信息汇总

数据集概述

数据集名称

Dataset Name

数据集摘要

这是一个经过词形还原处理的Ag News数据集版本。

语言

英语

引用信息

bibtex @inproceedings{xu-etal-2023-vontss, title = "v{ONTSS}: v{MF} based semi-supervised neural topic modeling with optimal transport", author = "Xu, Weijie and Jiang, Xiaoyu and Sengamedu Hanumantha Rao, Srinivasan and Iannacci, Francis and Zhao, Jinjin", booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-acl.271", doi = "10.18653/v1/2023.findings-acl.271", pages = "4433--4457", abstract = "Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作