KIND-Dataset/KIND
收藏数据集概述
KIND数据集是一个新的方言数据集。该数据集源于一个数据马拉松竞赛,参赛者的目标是在固定时间内尽可能多地用自己方言回答提示,同时尽量减少错误。
数据字段
- dialect_code: 指示文本所属特定方言的标签。
- sentenceOriginID: 引用翻译的MSA句子(1000000-2000000)或链接到构建的问题数据集的引用(2000000-3000000)的标识符。
- textString: 提交的句子。
引用信息
@inproceedings{yamani-etal-2024-kind, title = "The {KIND} Dataset: A Social Collaboration Approach for Nuanced Dialect Data Collection", author = "Yamani, Asma and Alziyady, Raghad and AlYami, Reem and Albelali, Salma and Albelali, Leina and Almulhim, Jawharah and Alsulami, Amjad and Alfarraj, Motaz and Al-Zaidy, Rabeah", editor = "Falk, Neele and Papi, Sara and Zhang, Mike", booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop", month = mar, year = "2024", address = "St. Julian{}s, Malta", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.eacl-srw.3", pages = "32--43", }



