Replication Data for: Hindi-English code-mixed Twitter dataset
收藏DataCite Commons2025-05-12 更新2025-05-17 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/BIUUW4
下载链接
链接失效反馈官方服务:
资源简介:
This directory contains a large-scale Hindi-English code-mixed corpus collected from Twitter between 2010-2022. We have removed the identifiers for anonymizing the dataset. We have de-anonymized the tweet author ids. Additionally, we have calculated code-mixing index (CMI) and the language of the texts (Hindi, English or, Hindi-English code-mixed).
提供机构:
Harvard Dataverse
创建时间:
2023-04-17



