five

Anime Recommendation Database 2020|动漫推荐数据集|用户行为分析数据集

收藏
www.kaggle.com2021-07-13 更新2025-01-22 收录
动漫推荐
用户行为分析
下载链接:
https://www.kaggle.com/hernan4444/anime-recommendation-database-2020
下载链接
链接失效反馈
资源简介:
# MyAnimeList Database 2020 > Recommendation data from 320.0000 users and 16.000 animes at myanimelist.net This dataset contains information about 17.562 anime and the preference from 325.772 different users. In particular, this dataset contain: - The anime list per user. Include dropped, complete, plan to watch, currently watching and on hold. - Ratings given by users to the animes that they has watched completely. - Information about the anime like genre, stats, studio, etc. - HTML with anime information to do data scrapping. These files contain information such as reviews, synopsis, information about the staff, anime statistics, genre, etc. Also, the code used to collect the data is available at github: https://github.com/Hernan4444/MyAnimeList-Database. ### Warning: this dataset includes information about anime for adults (hentai). ## Content **The data was scrapped between February 26th and March 20th.** * The "html" folder contain 1 zip per anime (17.562 different anime). Each zip contains different HTML pages scrapped from [MyAnimeList](https://myanimelist.net/). The scrapped pages are: 1. Main page 2. Reviews 3. Recommendations 4. Stats 5. Characters & Staff I uploaded 2 files as example to don't increase the size of this dataset. All HTML files are in this link: https://drive.google.com/drive/folders/12ghJk-sWyXXORoLBUpPirK4YdtIaZPV_?usp=sharing * `animelist.csv` have the list of all animes register by the user with the respective score, watching status and numbers of episodes watched. This dataset contains 109 Million row, 17.562 different animes and 325.772 different users. The file have the following columns: 1. user_id: non identifiable randomly generated user id. 2. anime_id: MyAnimeList ID of the anime. (e.g. 1). 3. score: score between 1 to 10 given by the user. 0 if the user didn't assign a score. (e.g. 10) 4. watching_status: state ID from this anime in the anime list of this user. (e.g. 2) 5. watched_episodes: numbers of episodes watched by the user. (e.g. 24) * `watching_status.csv` describe every possible status of the column: "watching_status" in `animelist.csv`. * `rating_complete.csv` is a subset of `animelist.csv`. This dataset only considers animes that the user has watched completely (`watching_status==2`) and gave it a score (`score!=0`). This dataset contains 57 Million ratings applied to 16.872 animes by 310.059 users. This file have the following columns: 1. user_id: non identifiable randomly generated user id. 2. anime_id: - MyAnimelist ID of the anime that this user has rated. 3. rating: rating that this user has assigned. * `anime.csv` contain general information of every anime (17.562 different anime) like genre, stats, studio, etc. This file have the following columns: 1. MAL_ID: MyAnimelist ID of the anime. (e.g. 1) 2. Name: full name of the anime. (e.g. Cowboy Bebop) 3. Score: average score of the anime given from all users in MyAnimelist database. (e.g. 8.78) 4. Genres: comma separated list of genres for this anime. (e.g. Action, Adventure, Comedy, Drama, Sci-Fi, Space) 5. English name: full name in english of the anime. (e.g. Cowboy Bebop) 6. Japanese name: full name in japanses of the anime. (e.g. カウボーイビバップ) 7. Type: TV, movie, OVA, etc. (e.g. TV) 8. Episodes': number of chapters. (e.g. 26) 9. Aired: broadcast date. (e.g. Apr 3, 1998 to Apr 24, 1999) 10. Premiered: season premiere. (e.g. Spring 1998) 11. Producers: comma separated list of produducers (e.g. Bandai Visual) 12. Licensors: comma separated list of licensors (e.g. Funimation, Bandai Entertainment) 13. Studios: comma separated list of studios (e.g. Sunrise) 14. Source: Manga, Light novel, Book, etc. (e.g Original) 15. Duration: duration of the anime per episode (e.g 24 min. per ep.) 16. Rating: age rate (e.g. R - 17+ (violence & profanity)) 17. Ranked: position based in the score. (e.g 28) 18. Popularity: position based in the the number of users who have added the anime to their list. (e.g 39) 19. Members: number of community members that are in this anime's "group". (e.g. 1251960) 20. Favorites: number of users who have the anime as "favorites". (e.g. 61,971) 21. Watching: number of users who are watching the anime. (e.g. 105808) 22. Completed: number of users who have complete the anime. (e.g. 718161) 23. On-Hold: number of users who have the anime on Hold. (e.g. 71513) 24. Dropped: number of users who have dropped the anime. (e.g. 26678) 25. Plan to Watch': number of users who plan to watch the anime. (e.g. 329800) 26. Score-10': number of users who scored 10. (e.g. 229170) 27. Score-9': number of users who scored 9. (e.g. 182126) 28. Score-8': number of users who scored 8. (e.g. 131625) 29. Score-7': number of users who scored 7. (e.g. 62330) 30. Score-6': number of users who scored 6. (e.g. 20688) 31. Score-5': number of users who scored 5. (e.g. 8904) 32. Score-4': number of users who scored 4. (e.g. 3184) 33. Score-3': number of users who scored 3. (e.g. 1357) 34. Score-2': number of users who scored 2. (e.g. 741) 35. Score-1': number of users who scored 1. (e.g. 1580) ## Acknowledgements Thanks to: 1. [MyAnimeList](https://myanimelist.net/) for providing anime data. 2. [Jikan API](https://jikan.docs.apiary.io/) for provide users preference. 3. Pontificia Universidad Católica de Chile for provide servers to run the code. ## Inspiration 1. Have an HTML files to experience the scraping exercise without the delay of each requests. 2. Experiment with different types of recommended. For instance, collaborative filtering or based on context like stats, genre, seiyus, reviews, synopsis, etc. 3. Use this information to build a better anime recommended system. 4. Identifying which feature allows us to build the best anime recommended system. ## Ideas to the future 1. Build the same dataset with manga and novel.

{'- HTML with anime information to do data scrapping. These files contain information such as reviews, synopsis, information about the staff, anime statistics, genre, etc.': '- 包含动画信息的HTML文件,用于数据抓取。这些文件包含评论、简介、工作人员信息、动画统计数据、类型等。', "3. score: score between 1 to 10 given by the user. 0 if the user didn't assign a score. (e.g. 10)": '3. score:用户给出的1至10的评分。如用户未评分,则为0。(例如:10)', '2. Reviews': '2. 评论', '# MyAnimeList Database 2020': '《MyAnimeList数据库2020》', '3. Recommendations': '3. 推荐', '4. watching_status: state ID from this anime in the anime list of this user. (e.g. 2)': '4. watching_status:此动画在用户列表中的状态ID。(例如:2', 'This dataset contains information about 17.562 anime and the preference from 325.772 different users. In particular, this dataset contain': '本数据集涵盖了17,562部动画及其325,772位不同用户偏好的信息,具体包括', '> Recommendation data from 320.0000 users and 16.000 animes at myanimelist.net': '该数据集收录了来自myanimelist.net网站320,000名用户对16,000部动画的推荐数据', '5. Characters & Staff': '5. 角色及工作人员', '* `animelist.csv` have the list of all animes register by the user with the respective score, watching status and numbers of episodes watched. This dataset contains 109 Million row, 17.562 different animes and 325.772 different users. The file have the following columns': '* `animelist.csv` 包含所有由用户注册的动画列表,包括相应的评分、观看状态和观看的集数。该数据集包含1.09亿行,17.562部不同的动画和325,772位不同的用户。文件包含以下列', '## Content': '## 数据内容', '**The data was scrapped between February 26th and March 20th.**': '**数据收集时间介于2020年2月26日至3月20日之间。**', '1. user_id: non identifiable randomly generated user id.': '1. user_id:不可识别的随机生成的用户ID。', '2. anime_id: MyAnimeList ID of the anime. (e.g. 1).': '2. anime_id:动画的MyAnimeList ID。(例如:1)', '4. Stats': '4. 统计数据', '* The "html" folder contain 1 zip per anime (17.562 different anime). Each zip contains different HTML pages scrapped from [MyAnimeList](https://myanimelist.net/). The scrapped pages are': '* “html”文件夹中包含每个动画(共计17,562部)的一个压缩包。每个压缩包包含从[MyAnimeList](https://myanimelist.net/)抓取的不同HTML页面。抓取的页面包括', '- Ratings given by users to the animes that they has watched completely.': '- 用户对其已完整观看的动画所给予的评价。', '### Warning: this dataset includes information about anime for adults (hentai).': '### 注意:本数据集包含成人向动画(hentai)的信息。', '1. Main page': '1. 主页', '- The anime list per user. Include dropped, complete, plan to watch, currently watching and on hold.': '- 每位用户的动画列表,包括已放弃、已完成、计划观看、正在观看和暂停观看的状态。', "I uploaded 2 files as example to don't increase the size of this dataset. All HTML files are in this link: https://drive.google.com/drive/folders/12ghJk-sWyXXORoLBUpPirK4YdtIaZPV_?usp=sharing": '为避免增加数据集的体积,我上传了2个示例文件。所有HTML文件可在此链接找到:https://drive.google.com/drive/folders/12ghJk-sWyXXORoLBUpPirK4YdtIaZPV_?usp=sharing', 'The file have the following columns': '文件包含以下列', '- Information about the anime like genre, stats, studio, etc.': '- 关于动画的信息,如类型、统计数据、制作公司等。', '5. watched_episodes: numbers of episodes watched by the user. (e.g. 24)': '5. watched_episodes:用户观看的集数。(例如:24)', 'Also, the code used to collect the data is available at github: https://github.com/Hernan4444/MyAnimeList-Database.': '此外,收集数据的代码可在GitHub上获取:https://github.com/Hernan4444/MyAnimeList-Database。'}
提供机构:
Kaggle
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

LinkedIn Salary Insights Dataset

LinkedIn Salary Insights Dataset 提供了全球范围内的薪资数据,包括不同职位、行业、地理位置和经验水平的薪资信息。该数据集旨在帮助用户了解薪资趋势和市场行情,支持职业规划和薪资谈判。

www.linkedin.com 收录

陸委會新聞稿

本會發布之新聞稿

台湾省政府资料开放平台 收录

学生课堂行为数据集 (SCB-dataset3)

学生课堂行为数据集(SCB-dataset3)由成都东软学院创建,包含5686张图像和45578个标签,重点关注六种行为:举手、阅读、写作、使用手机、低头和趴桌。数据集覆盖从幼儿园到大学的不同场景,通过YOLOv5、YOLOv7和YOLOv8算法评估,平均精度达到80.3%。该数据集旨在为学生行为检测研究提供坚实基础,解决教育领域中学生行为数据集的缺乏问题。

arXiv 收录

中国食物成分数据库

食物成分数据比较准确而详细地描述农作物、水产类、畜禽肉类等人类赖以生存的基本食物的品质和营养成分含量。它是一个重要的我国公共卫生数据和营养信息资源,是提供人类基本需求和基本社会保障的先决条件;也是一个国家制定相关法规标准、实施有关营养政策、开展食品贸易和进行营养健康教育的基础,兼具学术、经济、社会等多种价值。 本数据集收录了基于2002年食物成分表的1506条食物的31项营养成分(含胆固醇)数据,657条食物的18种氨基酸数据、441条食物的32种脂肪酸数据、130条食物的碘数据、114条食物的大豆异黄酮数据。

国家人口健康科学数据中心 收录

Fruits-360

一个高质量的水果图像数据集,包含多种水果的图像,如苹果、香蕉、樱桃等,总计42345张图片,分为训练集和验证集,共有64个水果类别。

github 收录