E-learning Recommender System Dataset
收藏DataONE2022-09-23 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:c2e25515fdf6f3fa6ce2d866a936bf5ceb02b07719397971907736e976b1b644
下载链接
链接失效反馈官方服务:
资源简介:
Mandarine Academy Recommender System (MARS) Dataset is captured from real-world open MOOC {https://mooc.office365-training.com/}. The dataset offers both explicit and implicit ratings, for both French and English versions of the MOOC. Compared with classical recommendation datasets like Movielens, this is a rather small dataset due to the nature of available content (educational). However, the dataset offers insights into real-world ratings and provides testing grounds away from common datasets. All items are available online for viewing in both French and English versions. All selected users had rated at least 1 item. No demographic information is included. Each user is represented by an id and job (if available). For both French and English, the same kind of files is available in .csv format. We provide the following files: Users: contains information about user ids and their jobs. Items: contains information about items (resources) in the selected language. Contains a mix of feature types. Ratings: Both explicit (Watch time) and implicit (page views of items). Formatting and Encoding The dataset files are written as comma-separated values files with a single header row. Columns that contain commas (,) are escaped using double quotes (\"). These files are encoded as UTF-8. User Ids User ids are consistent between explicit_ratings.csv and implicit_ratings.csv and users.csv (i.e., the same id refers to the same user across the dataset). Item Ids Item ids are consistent between explicit_ratings.csv, implicit_ratings.csv, and items.csv (i.e., the same id refers to the same item across the dataset). Ratings Data File Structure All ratings are contained in the files explicit_ratings.csv and implicit_ratings.csv. Each line of this file after the header row represents one rating of one item by one user, and has the following format: item_id,user_id,created_at (implicit_ratings.csv) user_id,item_id,watch_percentage,created_at,rating (explicit_ratings.csv) Item Data File Structure Item information is contained in the file items.csv. Each line of this file after the header row represents one item, and has the following format: item_id,language,name,nb_views,description,created_at,Difficulty,Job,Software,Theme,duration,type
曼达林学院推荐系统(Mandarine Academy Recommender System, MARS)数据集采集自真实开源大规模开放在线课程(Massive Open Online Course, MOOC)平台https://mooc.office365-training.com/。该数据集涵盖该MOOC法语与英语版本的显式评分与隐式评分数据。
与MovieLens等经典推荐数据集相比,由于其内容为教育类素材,该数据集规模相对较小。但该数据集提供了真实世界评分场景的相关洞察,同时为脱离通用数据集的研究提供了测试载体。
所有课程资源均支持通过线上平台分别以法语、英语版本浏览。所有入选用户均至少对1项课程资源完成过评分。数据集未收录任何用户人口统计相关信息。每位用户通过用户ID与职业信息(若可获取)进行唯一标识。
法语与英语版本数据集均提供相同格式的逗号分隔值(Comma-Separated Values, CSV)文件。本次提供的数据集文件包括:
users.csv:包含用户ID及其职业信息。
items.csv:存储所选语言版本下的课程资源相关信息,涵盖多种特征类型。
评分数据:包含显式评分与隐式评分两类,其中显式评分以观看时长(Watch time)为表征依据,隐式评分以课程资源页面浏览量(page views of items)为表征依据。
## 文件格式与编码规范
数据集文件均采用单表头行的逗号分隔值格式存储。若列字段内容包含逗号(,),则使用双引号(")进行转义处理。所有文件均采用UTF-8编码格式。
## 用户ID一致性说明
explicit_ratings.csv、implicit_ratings.csv与users.csv中的用户ID保持一致,即同一ID在整个数据集中对应同一用户。
## 项目ID一致性说明
explicit_ratings.csv、implicit_ratings.csv与items.csv中的项目ID保持一致,即同一ID在整个数据集中对应同一课程资源。
## 评分数据文件结构
所有评分数据均存储于explicit_ratings.csv与implicit_ratings.csv文件中。表头行之后的每一行代表一位用户对一项课程资源的一次评分,格式如下:
implicit_ratings.csv:各字段依次为项目ID、用户ID、评分创建时间,格式为item_id,user_id,created_at。
explicit_ratings.csv:各字段依次为用户ID、项目ID、观看占比、评分创建时间、显式评分值,格式为user_id,item_id,watch_percentage,created_at,rating。
## 课程资源数据文件结构
课程资源信息存储于items.csv文件中。表头行之后的每一行代表一项课程资源,格式如下:
各字段依次为项目ID、语言版本、资源名称、浏览量、资源描述、创建时间、难度等级、适配职业、所需软件、主题分类、时长、资源类型,格式为item_id,language,name,nb_views,description,created_at,Difficulty,Job,Software,Theme,duration,type。
创建时间:
2023-11-08
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



