LingoIITGN/MMT
收藏Hugging Face2025-08-04 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/LingoIITGN/MMT
下载链接
链接失效反馈官方服务:
资源简介:
MMT(多语言和多主题Twitter语言识别数据集)是一个大规模的语言识别数据集,从印度Twitter/X收集的170万条推文中获取,并带有粗粒度和细粒度的语言标签。该数据集支持在嘈杂的现实世界社交媒体环境中对多语言和代码混合文本的研究。
MMT: A Multilingual and Multi-Topic Indian Social Media Dataset is a large-scale language identification dataset derived from 1.7 million tweets collected from Indian Twitter/X, annotated with coarse and fine-grained language labels. It supports research on multilingual and code-mixed text in noisy, real-world social media settings.
提供机构:
LingoIITGN



