TheGreatRambler/mm2_user_first_cleared

Name: TheGreatRambler/mm2_user_first_cleared
Creator: TheGreatRambler
Published: 2022-11-11 08:04:34
License: 暂无描述

Hugging Face2022-11-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/TheGreatRambler/mm2_user_first_cleared

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - multilingual license: - cc-by-nc-sa-4.0 multilinguality: - multilingual size_categories: - 10M<n<100M source_datasets: - original task_categories: - other - object-detection - text-retrieval - token-classification - text-generation task_ids: [] pretty_name: Mario Maker 2 user first clears tags: - text-mining --- # Mario Maker 2 user first clears Part of the [Mario Maker 2 Dataset Collection](https://tgrcode.com/posts/mario_maker_2_datasets) ## Dataset Description The Mario Maker 2 user first clears dataset consists of 17.8 million first clears from Nintendo's online service totaling around 157MB of data. The dataset was created using the self-hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api) over the course of 1 month in February 2022. ### How to use it The Mario Maker 2 user first clears dataset is a very large dataset so for most use cases it is recommended to make use of the streaming API of `datasets`. You can load and iterate through the dataset with the following code: ```python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: { 'pid': '14510618610706594411', 'data_id': 25199891 } ``` Each row is a unique first clear in the level denoted by the `data_id` done by the player denoted by the `pid`. You can also download the full dataset. Note that this will download ~157MB: ```python ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", split="train") ``` ## Data Structure ### Data Instances ```python { 'pid': '14510618610706594411', 'data_id': 25199891 } ``` ### Data Fields |Field|Type|Description| |---|---|---| |pid|string|The player ID of this user, an unsigned 64 bit integer as a string| |data_id|int|The data ID of the level this user first cleared| ### Data Splits The dataset only contains a train split.  ## Dataset Creation The dataset was created over a little more than a month in Febuary 2022 using the self hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api). As requests made to Nintendo's servers require authentication the process had to be done with upmost care and limiting download speed as to not overload the API and risk a ban. There are no intentions to create an updated release of this dataset. ## Considerations for Using the Data The dataset contains no harmful language or depictions.

提供机构：

TheGreatRambler

原始信息汇总

Mario Maker 2 user first clears 数据集概述

基本信息

语言: 多语言
许可证: cc-by-nc-sa-4.0
多语言性: 多语言
数据集大小: 10M<n<100M
数据来源: 原始数据
任务类别: 其他, 目标检测, 文本检索, 令牌分类, 文本生成
任务ID: 无
美观名称: Mario Maker 2 user first clears
标签: 文本挖掘

数据集描述

数据内容: 包含17.8百万个首次通关记录，总数据量约157MB。
采集时间: 2022年2月，历时约一个月。
采集方式: 使用自托管的Mario Maker 2 API。

数据结构

数据实例: python { pid: 14510618610706594411, data_id: 25199891 }
数据字段:

字段类型描述

pid string 玩家ID，64位无符号整数作为字符串

data_id int 玩家首次通关的关卡数据ID
数据分割: 仅包含训练集。

数据使用

加载方式: 推荐使用datasets库的流式API加载数据。
示例代码: python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train") print(next(iter(ds)))

数据集创建

创建时间: 2022年2月，持续一个多月。
创建过程: 通过自托管的Mario Maker 2 API进行数据收集，需注意避免对Nintendo服务器造成过载。
更新计划: 无更新计划。

使用考虑

内容审查: 数据集不含有害语言或描绘。

5,000+

优质数据集

54 个

任务类型

进入经典数据集

字段	类型	描述
pid	string	玩家ID，64位无符号整数作为字符串
data_id	int	玩家首次通关的关卡数据ID