MuLVE (Multi-Language Vocabulary Evaluation)

Name: MuLVE (Multi-Language Vocabulary Evaluation)
Creator: OpenDataLab
Published: 2026-05-24 10:30:10
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/MuLVE

下载链接

链接失效反馈

官方服务：

资源简介：

Multi-Language Vocabulary Evaluation Data Set (MuLVE) 是由词汇卡和现实生活中的用户答案组成的数据集，标记了用户答案是正确还是错误。数据的来源是来自 Phase6 词汇训练器的用户学习数据。该数据集包含德语和英语、西班牙语和法语作为目标语言的词汇问题，并且在预处理和重复数据删除方面有四种不同的变体。_x000D_ _x000D_ _x000D_ _x000D_ 它被分成四个制表符分隔的文件，每个变体一个，每个训练集和测试集。这些文件包括以下列：_x000D_ _x000D_ cardId - 数字卡 ID_x000D_ 问题 - 词汇卡 question_x000D_ 答案 - 词汇卡 answer_x000D_ userAnswer - 回答用户输入_x000D_ 标签 - 如果用户回答正确则为真，否则为假_x000D_ 语言 - 目标语言（英语、法语或西班牙语）_x000D_ 处理后的数据集变体不包括 include \textbf{userAnswer} 列，而是以下附加列：_x000D_ _x000D_ question_norm - 问题归一化_x000D_ answer_norm - 回答 normalized_x000D_ userAnswer_norm - 用户回答标准化

Multi-Language Vocabulary Evaluation Data Set (MuLVE) is a dataset composed of vocabulary flashcards and real-world user responses, annotated with binary labels indicating whether each submitted user answer is correct or incorrect. The data is sourced from user learning data collected from the Phase6 vocabulary trainer. This dataset includes vocabulary questions with four target languages: German, English, Spanish, and French, and features four distinct variants generated through preprocessing and deduplication procedures. The dataset is split into four tab-separated files, with one file per variant, and each file contains both the training and test subsets. These files include the following columns: - cardId: numeric flashcard ID - question: vocabulary flashcard question - answer: vocabulary flashcard answer - userAnswer: user-submitted response - label: true if the user's answer is correct, otherwise false - language: target language (English, French, or Spanish) The preprocessed dataset variants exclude the userAnswer column, and instead include the following additional columns: - question_norm: normalized question - answer_norm: normalized answer - userAnswer_norm: normalized user-submitted response

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍