five

MuLVE

收藏
arXiv2025-09-30 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/9487
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了一系列词汇卡片和真实用户的回答,这些回答被标注为正确或错误,涵盖了德语、英语、西班牙语和法语四种语言。此外,数据集还包括了基于预处理(标准化与预处理)和去重(重复与去重)的不同变体,并具体包含了卡片ID、问题、答案、用户回答、标签和语言等列。该数据集规模宏大,拥有超过1200万的数据点,为了训练和验证,已将其下采样至100万。这项任务的目的是进行词汇评估。

This dataset comprises a collection of vocabulary flashcards and real user responses, which are annotated as either correct or incorrect, covering four languages: German, English, Spanish and French. Additionally, the dataset includes different variants based on preprocessing (standardization and preprocessing) and deduplication (duplicate removal and deduplication), and specifically contains columns such as card ID, question, answer, user response, label and language. This is a large-scale dataset with over 12 million data points, which has been downsampled to 1 million for training and validation purposes. The objective of this task is vocabulary evaluation.
提供机构:
European Language Grid
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作