E-learning Recommender System Dataset
收藏DataCite Commons2025-05-12 更新2025-05-17 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/BMY3UD
下载链接
链接失效反馈官方服务:
资源简介:
<p>Mandarine Academy Recommender System (MARS) Dataset is captured from real-world open MOOC {https://mooc.office365-training.com/}. The dataset offers both explicit and implicit ratings, for both French and English versions of the MOOC. Compared with classical recommendation datasets like Movielens, this is a rather small dataset due to the nature of available content (educational). However, the dataset offers insights into real-world ratings and provides testing grounds away from common datasets.</p>
<p>All items are available online for viewing in both French and English versions. All selected users had rated at least 1 item. No demographic information is included. Each user is represented by an id and job (if available).</p>
<p><br />For both French and English, the same kind of files is available in .csv format. We provide the following files:</p>
<ul>
<li><strong>Users</strong>: contains information about user ids and their jobs.</li>
<li><strong>Items</strong>: contains information about items (resources) in the selected language. Contains a mix of feature types.</li>
<li><strong>Ratings</strong>: Both <span style="text-decoration: underline;">explicit</span> (Watch time) and <span style="text-decoration: underline;">implicit</span> (page views of items).</li>
</ul>
<h3><strong>Formatting and Encoding</strong></h3>
<p>The dataset files are written as&nbsp;<a href="http://en.wikipedia.org/wiki/Comma-separated_values">comma-separated values</a>&nbsp;files with a single header row. Columns that contain commas (,) are escaped using double quotes ("). These files are encoded as UTF-8.</p>
<h3><strong>User Ids</strong></h3>
<p>User ids are consistent between&nbsp;explicit_ratings.csv&nbsp;and implicit_ratings.csv&nbsp;and&nbsp;users.csv (i.e., the same id refers to the same user across the dataset).</p>
<h3><strong>Item Ids</strong></h3>
<p>Item ids are consistent between explicit_ratings.csv, implicit_ratings.csv, and items.csv (i.e., the same id refers to the same item across the dataset).</p>
<h3><strong>Ratings Data File Structure</strong></h3>
<p>All ratings are contained in the files explicit_ratings.csv and implicit_ratings.csv. Each line of this file after the header row represents one rating of one item by one user, and has the following format:</p>
<ul>
<li>item_id,user_id,created_at (implicit_ratings.csv)</li>
<li>user_id,item_id,watch_percentage,created_at,rating (explicit_ratings.csv)</li>
</ul>
<h3><strong>Item Data File Structure</strong></h3>
<p>Item information is contained in the file items.csv. Each line of this file after the header row represents one item, and has the following format:</p>
<ul>
<li>item_id,language,name,nb_views,description,created_at,Difficulty,Job,Software,Theme,duration,type</li>
</ul>
提供机构:
Harvard Dataverse
创建时间:
2022-09-23



