CASE 2022
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/emerging-welfare/case-2022-multilingual-event
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对多语言抗议事件检测的扩展测试数据,新增了普通话、土耳其语和乌尔都语。新的文档级别数据是由合作者随机抽取并标注的,遵循与CASE 2021相同的标注手册。在规模上,测试数据包括:英语3,870条,印地语267条,普通话300条,葡萄牙语670条,西班牙语399条,土耳其语300条,乌尔都语299条。该数据集的任务是文档分类。
This dataset is an extended test dataset for multilingual protest event detection, with newly added Mandarin, Turkish and Urdu languages. The new document-level data was randomly sampled and annotated by collaborators, following the same annotation guidelines as CASE 2021. In terms of scale, the test dataset includes: 3,870 instances in English, 267 in Hindi, 300 in Mandarin, 670 in Portuguese, 399 in Spanish, 300 in Turkish, and 299 in Urdu. The task of this dataset is document classification.



