five

BhashaHMPV Dataset : Multilingual HMPV News and Fact-Check Articles Dataset for Indian Regional Languages

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15194115
下载链接
链接失效反馈
官方服务:
资源简介:
For the collection of Google News articles on HMPV in the Indian context, we scraped the articles using Python-based framework known as Splinter. In the script, we queried terms and phrases such as “HMPV”, “hmpv india”. The results included articles from the Google News website in a variety of languages, and the websites’ domains and languages were noted. We automated the URL by changing the language filter of the website Also in some cases, all articles were scraped and those unrelated to HMPV were filtered out in the pre-processing stage. All the samples collected were then put together into one CSV.We retrieved articles in ten Indian languages supported by Google News, namely: Bengali, English, Gujarati, Hindi, Marathi, Malayalam, Punjabi, Tamil, Telugu, Urdu, and Kannada.We also performed stemming for each language, and the stemmed outputs were added as separate columns in the respective language-specific sheets of the final CSV file. The following information was extracted along with the news articles:1) language of the Google News article2) title of the Google news article3) source of the Google news article (if available)4) link of the Google news article5) content of the Google news article6) domain of the article For the collection of Google Fact-Check articles, we used the Google Fact-Check API key to fetch the articles.In the python script, we queried terms and phrases such as “HMPV”, "hmpv india".We also performed stemming for each language, and the stemmed outputs were added as separate columns in the respective language-specific sheets of the final CSV file.The following information was extracted along with the news articles: 1) claim-text of the Google fact-check article2) claimant of the Google fact-check article3) claim-date of the Google fact-check article4) review-publisher of the Google fact-check article5) review-title of the Google fact-check article6) review-url of the Google fact-check article7) review-date of the Google fact-check article8) textual-rating of the Google fact-check article9) extracted-content of the Google fact-check article
创建时间:
2025-04-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作