Replication Data for: Hindi-English code-mixed Twitter dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://doi.org/10.7910/DVN/BIUUW4
下载链接
链接失效反馈官方服务:
资源简介:
This directory contains a large-scale Hindi-English code-mixed corpus collected from Twitter between 2010-2022. We have removed the identifiers for anonymizing the dataset. We have de-anonymized the tweet author ids. Additionally, we have calculated code-mixing index (CMI) and the language of the texts (Hindi, English or, Hindi-English code-mixed).
创建时间:
2024-02-06



