BanFakeNews

OpenDataLab2026-04-05 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/BanFakeNews

下载链接

链接失效反馈

资源简介：

观察到假新闻在政治和金融等各个领域的快速传播可能造成的损害，使用语言分析自动识别假新闻引起了研究界的关注。然而，这种方法主要是为英语开发的，其中低资源语言仍然不在关注点。但虚假和操纵性新闻所产生的风险并不受语言的限制。在这项工作中，我们提出了一个约 50K 新闻的注释数据集，可用于为像孟加拉语这样的低资源语言构建自动假新闻检测系统。此外，我们提供数据集分析，并使用最先进的 NLP 技术开发基准系统，以识别孟加拉假新闻。为了创建这个系统，我们探索了传统的语言特征和基于神经网络的方法。我们预计该数据集将成为构建技术以防止假新闻传播并为使用低资源语言的研究做出贡献的宝贵资源。

Given the observed harms caused by the rapid spread of fake news across various domains such as politics and finance, automatic fake news detection via linguistic analysis has attracted considerable attention from the research community. However, such approaches have primarily been developed for English, leaving low-resource languages largely unaddressed. Yet the risks posed by false and manipulative news are not constrained by language. In this work, we present an annotated dataset of approximately 50K news articles, which can be used to build automatic fake news detection systems for low-resource languages such as Bengali. Additionally, we provide a comprehensive dataset analysis and develop benchmark systems for detecting Bengali fake news using state-of-the-art NLP techniques. To build these systems, we explore both traditional linguistic features and neural network-based methods. We anticipate that this dataset will serve as a valuable resource for developing technologies to combat the spread of fake news and advance research focused on low-resource languages.

提供机构：

OpenDataLab

创建时间：

2022-08-16

AI搜集汇总

数据集介绍