lcalvobartolome/rosie_mind_topics
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/lcalvobartolome/rosie_mind_topics
下载链接
链接失效反馈官方服务:
资源简介:
ROSIE-MIND-Topics是一个包含875,230段落的英语-西班牙语双语数据集,其中的主题模型信息是通过在这份数据上训练PLTM模型并提取30个主题得到的。该数据集作为MIND管道的输入数据集,用于检测问答对中的多语言和文化差异。每条记录包括段落和对应的完整文档、预处理输出(词形还原、翻译)以及主题模型特征(主题分布和主要主题)。
ROSIE-MIND-Topics is a bilingual (English-Spanish) dataset of 875,230 passages containing topic modeling information derived from training a PLTM model on this data with 30 topics. It serves as the input dataset for the MIND pipeline, which detects multilingual and cultural discrepancies in question-answer pairs. Each record includes the passage and corresponding full document, preprocessing outputs (lemmas, translations), and topic model features (topic distributions and dominant topic).
提供机构:
lcalvobartolome



