five

QCRI/PropXplain

收藏
Hugging Face2026-04-15 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/QCRI/PropXplain
下载链接
链接失效反馈
官方服务:
资源简介:
PropXplain是一个用于可解释宣传检测的多语言数据集,包含阿拉伯语和英语文本。该数据集提供了宣传分类标签和自然语言解释,支持开发可解释的宣传检测系统。每个样本包括:输入文本(原始内容)、二元标签(宣传性或非宣传性)和解释(分类决策的自然语言解释)。解释由大型语言模型生成,并通过质量评估验证其信息性、清晰度、合理性和忠实性。数据集支持的任务包括宣传分类、解释生成和多语言NLP。阿拉伯语部分包含约21K项(段落和推文),来自300家新闻机构和Twitter数据,主题涉及政治、人权和以巴冲突;英语部分包含约6K项(文章句子),来自42家不同政治倾向的新闻来源,主题涉及政治、战争报道和2023年末至2024年初的热门话题。数据集由卡塔尔计算研究所(QCRI)创建,采用MIT许可证。

PropXplain is a multilingual dataset for explainable propaganda detection in Arabic and English text. It provides both propaganda classification labels and natural language explanations, enabling the development of interpretable propaganda detection systems. Each sample includes: input text (original content), binary label (propagandistic or non-propagandistic), and explanation (natural language explanation of the classification decision). The explanations were generated using LLMs and validated through quality assessment to ensure informativeness, clarity, plausibility, and faithfulness. The dataset supports tasks including propaganda classification, explanation generation, and multilingual NLP. The Arabic portion contains ~21K items (paragraphs and tweets) from 300 news agencies and Twitter data, covering topics like politics, human rights, and the Israeli-Palestinian conflict. The English portion contains ~6K items (article sentences) from 42 news sources across the political spectrum, covering politics, war coverage, and trending topics (late 2023-early 2024). The dataset was created by Qatar Computing Research Institute (QCRI) and is licensed under MIT.
提供机构:
QCRI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作