five

MuLVE (Multi-Language Vocabulary Evaluation)

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MuLVE
下载链接
链接失效反馈
官方服务:
资源简介:
Multi-Language Vocabulary Evaluation Data Set (MuLVE) 是由词汇卡和现实生活中的用户答案组成的数据集,标记了用户答案是正确还是错误。数据的来源是来自 Phase6 词汇训练器的用户学习数据。该数据集包含德语和英语、西班牙语和法语作为目标语言的词汇问题,并且在预处理和重复数据删除方面有四种不同的变体。_x000D_ _x000D_ _x000D_ _x000D_ 它被分成四个制表符分隔的文件,每个变体一个,每个训练集和测试集。这些文件包括以下列:_x000D_ _x000D_ cardId - 数字卡 ID_x000D_ 问题 - 词汇卡 question_x000D_ 答案 - 词汇卡 answer_x000D_ userAnswer - 回答用户输入_x000D_ 标签 - 如果用户回答正确则为真,否则为假_x000D_ 语言 - 目标语言(英语、法语或西班牙语)_x000D_ 处理后的数据集变体不包括 include \textbf{userAnswer} 列,而是以下附加列:_x000D_ _x000D_ question_norm - 问题归一化_x000D_ answer_norm - 回答 normalized_x000D_ userAnswer_norm - 用户回答标准化

Multi-Language Vocabulary Evaluation Data Set (MuLVE) is a dataset composed of vocabulary flashcards and real-world user responses, annotated with binary labels indicating whether each submitted user answer is correct or incorrect. The data is sourced from user learning data collected from the Phase6 vocabulary trainer. This dataset includes vocabulary questions with four target languages: German, English, Spanish, and French, and features four distinct variants generated through preprocessing and deduplication procedures. The dataset is split into four tab-separated files, with one file per variant, and each file contains both the training and test subsets. These files include the following columns: - cardId: numeric flashcard ID - question: vocabulary flashcard question - answer: vocabulary flashcard answer - userAnswer: user-submitted response - label: true if the user's answer is correct, otherwise false - language: target language (English, French, or Spanish) The preprocessed dataset variants exclude the userAnswer column, and instead include the following additional columns: - question_norm: normalized question - answer_norm: normalized answer - userAnswer_norm: normalized user-submitted response
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
MuLVE是一个多语言词汇评估数据集,基于Phase6词汇训练器的用户学习数据构建,包含德语、英语、西班牙语和法语的词汇问题,并提供了四种预处理变体。数据以制表符分隔文件形式组织,包括问题、答案、用户答案和正确性标签等关键信息。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作