Multilingual Similarity Dataset for News Article Frame
收藏arXiv2024-05-22 更新2024-06-21 收录
下载链接:
https://zenodo.org/records/10611923
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘Multilingual Similarity Dataset for News Article Frame’,由马萨诸塞大学阿默斯特分校的研究团队开发。该数据集包含26,555对新闻文章,覆盖10种语言,每对文章都经过精心标注,以反映新闻内容的八个关键方面。数据集的创建过程采用人工在环框架,确保了高质量的大规模数据生成。此数据集主要应用于跨语言文档匹配、新闻聚类和多语言信息检索等领域,旨在解决新闻报道中的框架分析问题,从而深入理解媒体生态和全球新闻覆盖的多维视角。
This dataset, titled 'Multilingual Similarity Dataset for News Article Frame', was developed by a research team at the University of Massachusetts Amherst. It comprises 26,555 pairs of news articles across 10 languages, with each pair meticulously annotated to capture eight core aspects of news content. The dataset was constructed using a human-in-the-loop framework, ensuring the generation of high-quality large-scale data. Primarily applied in cross-lingual document matching, news clustering, multilingual information retrieval and other related fields, this dataset aims to address frame analysis issues in news reporting, thereby enabling in-depth understanding of media ecology and the multi-dimensional perspectives of global news coverage.
提供机构:
马萨诸塞大学阿默斯特分校
创建时间:
2024-05-22



