five

English-Hindi Code-Mixed Corpus

收藏
arXiv2018-05-30 更新2024-06-21 收录
下载链接:
https://github.com/sahilswami96/StanceDetectionCodeMixed
下载链接
链接失效反馈
官方服务:
资源简介:
本研究介绍了首个针对印度2016年废钞政策(Demonetisation)的英语-印地语混合语料库,包含3545条推文。数据集由语言技术研究中心创建,旨在通过分析社交媒体上的观点,了解公众对废钞政策的立场。数据集内容丰富,涵盖了支持、反对和中立三种立场,每条推文均经过立场标注和语言标注。创建过程中,数据通过Twitter Scraper API收集,并由精通英语和印地语的母语者进行标注。该数据集适用于开发和评估立场检测和语言识别技术,有助于深入理解公众意见和语言使用模式。

This study presents the first English-Hindi mixed-language corpus targeting India's 2016 Demonetisation policy, which consists of 3545 tweets. The corpus was developed by the Language Technology Research Center, with the core goal of understanding public stances toward the Demonetisation policy by analyzing social media opinions. The corpus covers comprehensive content including three distinct stances: supportive, opposing and neutral, and each tweet has been annotated for both stance and language. During its construction, the dataset was collected via the Twitter Scraper API and annotated by native speakers proficient in both English and Hindi. This corpus is applicable for developing and evaluating stance detection and language identification technologies, and facilitates in-depth understanding of public opinions and language usage patterns.
提供机构:
语言技术研究中心,国际信息技术研究所,海得拉巴
创建时间:
2018-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作