MuLVE
收藏arXiv2025-09-30 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/9487
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了一系列词汇卡片和真实用户的回答,这些回答被标注为正确或错误,涵盖了德语、英语、西班牙语和法语四种语言。此外,数据集还包括了基于预处理(标准化与预处理)和去重(重复与去重)的不同变体,并具体包含了卡片ID、问题、答案、用户回答、标签和语言等列。该数据集规模宏大,拥有超过1200万的数据点,为了训练和验证,已将其下采样至100万。这项任务的目的是进行词汇评估。
This dataset comprises a collection of vocabulary flashcards and real user responses, which are annotated as either correct or incorrect, covering four languages: German, English, Spanish and French. Additionally, the dataset includes different variants based on preprocessing (standardization and preprocessing) and deduplication (duplicate removal and deduplication), and specifically contains columns such as card ID, question, answer, user response, label and language. This is a large-scale dataset with over 12 million data points, which has been downsampled to 1 million for training and validation purposes. The objective of this task is vocabulary evaluation.
提供机构:
European Language Grid



