five

TheGreatRambler/mm2_user_first_cleared

收藏
Hugging Face2022-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TheGreatRambler/mm2_user_first_cleared
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - multilingual license: - cc-by-nc-sa-4.0 multilinguality: - multilingual size_categories: - 10M<n<100M source_datasets: - original task_categories: - other - object-detection - text-retrieval - token-classification - text-generation task_ids: [] pretty_name: Mario Maker 2 user first clears tags: - text-mining --- # Mario Maker 2 user first clears Part of the [Mario Maker 2 Dataset Collection](https://tgrcode.com/posts/mario_maker_2_datasets) ## Dataset Description The Mario Maker 2 user first clears dataset consists of 17.8 million first clears from Nintendo's online service totaling around 157MB of data. The dataset was created using the self-hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api) over the course of 1 month in February 2022. ### How to use it The Mario Maker 2 user first clears dataset is a very large dataset so for most use cases it is recommended to make use of the streaming API of `datasets`. You can load and iterate through the dataset with the following code: ```python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: { 'pid': '14510618610706594411', 'data_id': 25199891 } ``` Each row is a unique first clear in the level denoted by the `data_id` done by the player denoted by the `pid`. You can also download the full dataset. Note that this will download ~157MB: ```python ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", split="train") ``` ## Data Structure ### Data Instances ```python { 'pid': '14510618610706594411', 'data_id': 25199891 } ``` ### Data Fields |Field|Type|Description| |---|---|---| |pid|string|The player ID of this user, an unsigned 64 bit integer as a string| |data_id|int|The data ID of the level this user first cleared| ### Data Splits The dataset only contains a train split. <!-- TODO create detailed statistics --> ## Dataset Creation The dataset was created over a little more than a month in Febuary 2022 using the self hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api). As requests made to Nintendo's servers require authentication the process had to be done with upmost care and limiting download speed as to not overload the API and risk a ban. There are no intentions to create an updated release of this dataset. ## Considerations for Using the Data The dataset contains no harmful language or depictions.
提供机构:
TheGreatRambler
原始信息汇总

Mario Maker 2 user first clears 数据集概述

基本信息

  • 语言: 多语言
  • 许可证: cc-by-nc-sa-4.0
  • 多语言性: 多语言
  • 数据集大小: 10M<n<100M
  • 数据来源: 原始数据
  • 任务类别: 其他, 目标检测, 文本检索, 令牌分类, 文本生成
  • 任务ID: 无
  • 美观名称: Mario Maker 2 user first clears
  • 标签: 文本挖掘

数据集描述

  • 数据内容: 包含17.8百万个首次通关记录,总数据量约157MB。
  • 采集时间: 2022年2月,历时约一个月。
  • 采集方式: 使用自托管的Mario Maker 2 API。

数据结构

  • 数据实例: python { pid: 14510618610706594411, data_id: 25199891 }

  • 数据字段:

    字段 类型 描述
    pid string 玩家ID,64位无符号整数作为字符串
    data_id int 玩家首次通关的关卡数据ID
  • 数据分割: 仅包含训练集。

数据使用

  • 加载方式: 推荐使用datasets库的流式API加载数据。
  • 示例代码: python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train") print(next(iter(ds)))

数据集创建

  • 创建时间: 2022年2月,持续一个多月。
  • 创建过程: 通过自托管的Mario Maker 2 API进行数据收集,需注意避免对Nintendo服务器造成过载。
  • 更新计划: 无更新计划。

使用考虑

  • 内容审查: 数据集不含有害语言或描绘。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作