TheGreatRambler/mm2_user_first_cleared
收藏Hugging Face2022-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TheGreatRambler/mm2_user_first_cleared
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- multilingual
license:
- cc-by-nc-sa-4.0
multilinguality:
- multilingual
size_categories:
- 10M<n<100M
source_datasets:
- original
task_categories:
- other
- object-detection
- text-retrieval
- token-classification
- text-generation
task_ids: []
pretty_name: Mario Maker 2 user first clears
tags:
- text-mining
---
# Mario Maker 2 user first clears
Part of the [Mario Maker 2 Dataset Collection](https://tgrcode.com/posts/mario_maker_2_datasets)
## Dataset Description
The Mario Maker 2 user first clears dataset consists of 17.8 million first clears from Nintendo's online service totaling around 157MB of data. The dataset was created using the self-hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api) over the course of 1 month in February 2022.
### How to use it
The Mario Maker 2 user first clears dataset is a very large dataset so for most use cases it is recommended to make use of the streaming API of `datasets`. You can load and iterate through the dataset with the following code:
```python
from datasets import load_dataset
ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train")
print(next(iter(ds)))
#OUTPUT:
{
'pid': '14510618610706594411',
'data_id': 25199891
}
```
Each row is a unique first clear in the level denoted by the `data_id` done by the player denoted by the `pid`.
You can also download the full dataset. Note that this will download ~157MB:
```python
ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", split="train")
```
## Data Structure
### Data Instances
```python
{
'pid': '14510618610706594411',
'data_id': 25199891
}
```
### Data Fields
|Field|Type|Description|
|---|---|---|
|pid|string|The player ID of this user, an unsigned 64 bit integer as a string|
|data_id|int|The data ID of the level this user first cleared|
### Data Splits
The dataset only contains a train split.
<!-- TODO create detailed statistics -->
## Dataset Creation
The dataset was created over a little more than a month in Febuary 2022 using the self hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api). As requests made to Nintendo's servers require authentication the process had to be done with upmost care and limiting download speed as to not overload the API and risk a ban. There are no intentions to create an updated release of this dataset.
## Considerations for Using the Data
The dataset contains no harmful language or depictions.
提供机构:
TheGreatRambler
原始信息汇总
Mario Maker 2 user first clears 数据集概述
基本信息
- 语言: 多语言
- 许可证: cc-by-nc-sa-4.0
- 多语言性: 多语言
- 数据集大小: 10M<n<100M
- 数据来源: 原始数据
- 任务类别: 其他, 目标检测, 文本检索, 令牌分类, 文本生成
- 任务ID: 无
- 美观名称: Mario Maker 2 user first clears
- 标签: 文本挖掘
数据集描述
- 数据内容: 包含17.8百万个首次通关记录,总数据量约157MB。
- 采集时间: 2022年2月,历时约一个月。
- 采集方式: 使用自托管的Mario Maker 2 API。
数据结构
-
数据实例: python { pid: 14510618610706594411, data_id: 25199891 }
-
数据字段:
字段 类型 描述 pid string 玩家ID,64位无符号整数作为字符串 data_id int 玩家首次通关的关卡数据ID -
数据分割: 仅包含训练集。
数据使用
- 加载方式: 推荐使用
datasets库的流式API加载数据。 - 示例代码: python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_user_first_cleared", streaming=True, split="train") print(next(iter(ds)))
数据集创建
- 创建时间: 2022年2月,持续一个多月。
- 创建过程: 通过自托管的Mario Maker 2 API进行数据收集,需注意避免对Nintendo服务器造成过载。
- 更新计划: 无更新计划。
使用考虑
- 内容审查: 数据集不含有害语言或描绘。



