H-Prop and H-Prop-News Propaganda Datasets in Hindi
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5828239
下载链接
链接失效反馈官方服务:
资源简介:
The H-Prop dataset contains 28,630 articles created by translating a portion of Proppy Corpus in Hindi. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling done indirectly in Proppy corpus using a technique known as distant supervision is retained.
The H-Prop-News dataset contains 5,500 Hindi News articles collected from 30+ prominent Hindi News websites. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling was done by human annotators and the inter-annotator agreement using Cohen’s Kappa measure observed is 0.81.
## Data format
We provide the H-Prop dataset in three tsv files, including training, testing and validation partitions. The H-Prop-News dataset is provided in csv files including training, testing and validation partitions.
Each line represents one article in H-Prop dataset with the following information:
1. article_text: the text of the article translated from Proppy corpus.
2. propaganda_label: label for articles retained from Proppy corpus.
Each line represents one article in H-Prop-News dataset with the following information:
1. news_website: Name of the news source website
2. article_url: the direct URL for the published article in its source website
3. news_headline: news headline
4. article_text: the text of the article retrieved via parsehub tool
5. propaganda_label: label for articles
## About
The H-Prop dataset was translated using IBM Watson Language Translator.
## Credit
Please cite the dataset as:
[HProp-News] Deptii Chaudhari, Ambika Pawar, and Alberto Barrón-Cedeño. 2022. H-Prop and H-Prop-News: Computational Propaganda Datasets in Hindi. doi: 10.5281/zenodo.5828240
## Authors
Deptii Chaudhari;
Ambika Pawar;
Alberto Barrón-Cedeno
创建时间:
2022-01-08



