TheGreatRambler/mm2_ninji

Name: TheGreatRambler/mm2_ninji
Creator: TheGreatRambler
Published: 2022-11-11 08:05:22
License: 暂无描述

Hugging Face2022-11-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/TheGreatRambler/mm2_ninji

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - multilingual license: - cc-by-nc-sa-4.0 multilinguality: - multilingual size_categories: - 1M<n<10M source_datasets: - original task_categories: - other - object-detection - text-retrieval - token-classification - text-generation task_ids: [] pretty_name: Mario Maker 2 ninjis tags: - text-mining --- # Mario Maker 2 ninjis Part of the [Mario Maker 2 Dataset Collection](https://tgrcode.com/posts/mario_maker_2_datasets) ## Dataset Description The Mario Maker 2 ninjis dataset consists of 3 million ninji replays from Nintendo's online service totaling around 12.5GB of data. The dataset was created using the self-hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api) over the course of 1 month in February 2022. ### How to use it The Mario Maker 2 ninjis dataset is a very large dataset so for most use cases it is recommended to make use of the streaming API of `datasets`. You can load and iterate through the dataset with the following code: ```python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: { 'data_id': 12171034, 'pid': '4748613890518923485', 'time': 83388, 'replay': [some binary data] } ``` Each row is a ninji run in the level denoted by the `data_id` done by the player denoted by the `pid`, The length of this ninji run is `time` in milliseconds. `replay` is a gzip compressed binary file format describing the animation frames and coordinates of the player throughout the run. Parsing the replay is as follows: ```python from datasets import load_dataset import zlib import struct ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") row = next(iter(ds)) replay = zlib.decompress(row["replay"]) frames = struct.unpack(">I", replay[0x10:0x14])[0] character = replay[0x14] character_mapping = { 0: "Mario", 1: "Luigi", 2: "Toad", 3: "Toadette" } # player_state is between 0 and 14 and varies between gamestyles # as outlined below. Determining the gamestyle of a particular run # and rendering the level being played requires TheGreatRambler/mm2_ninji_level player_state_base = { 0: "Run/Walk", 1: "Jump", 2: "Swim", 3: "Climbing", 5: "Sliding", 7: "Dry bones shell", 8: "Clown car", 9: "Cloud", 10: "Boot", 11: "Walking cat" } player_state_nsmbu = { 4: "Sliding", 6: "Turnaround", 10: "Yoshi", 12: "Acorn suit", 13: "Propeller active", 14: "Propeller neutral" } player_state_sm3dw = { 4: "Sliding", 6: "Turnaround", 7: "Clear pipe", 8: "Cat down attack", 13: "Propeller active", 14: "Propeller neutral" } player_state_smb1 = { 4: "Link down slash", 5: "Crouching" } player_state_smw = { 10: "Yoshi", 12: "Cape" } print("Frames: %d\nCharacter: %s" % (frames, character_mapping[character])) current_offset = 0x3C # Ninji updates are reported every 4 frames for i in range((frames + 2) // 4): flags = replay[current_offset] >> 4 player_state = replay[current_offset] & 0x0F current_offset += 1 x = struct.unpack("<H", replay[current_offset:current_offset + 2])[0] current_offset += 2 y = struct.unpack("<H", replay[current_offset:current_offset + 2])[0] current_offset += 2 if flags & 0b00000110: unk1 = replay[current_offset] current_offset += 1 in_subworld = flags & 0b00001000 print("Frame %d:\n Flags: %s,\n Animation state: %d,\n X: %d,\n Y: %d,\n In subworld: %s" % (i, bin(flags), player_state, x, y, in_subworld)) #OUTPUT: Frames: 5006 Character: Mario Frame 0: Flags: 0b0, Animation state: 0, X: 2672, Y: 2288, In subworld: 0 Frame 1: Flags: 0b0, Animation state: 0, X: 2682, Y: 2288, In subworld: 0 Frame 2: Flags: 0b0, Animation state: 0, X: 2716, Y: 2288, In subworld: 0 ... Frame 1249: Flags: 0b0, Animation state: 1, X: 59095, Y: 3749, In subworld: 0 Frame 1250: Flags: 0b0, Animation state: 1, X: 59246, Y: 3797, In subworld: 0 Frame 1251: Flags: 0b0, Animation state: 1, X: 59402, Y: 3769, In subworld: 0 ``` You can also download the full dataset. Note that this will download ~12.5GB: ```python ds = load_dataset("TheGreatRambler/mm2_ninji", split="train") ``` ## Data Structure ### Data Instances ```python { 'data_id': 12171034, 'pid': '4748613890518923485', 'time': 83388, 'replay': [some binary data] } ``` ### Data Fields |Field|Type|Description| |---|---|---| |data_id|int|The data ID of the level this run occured in| |pid|string|Player ID of the player| |time|int|Length in milliseconds of the run| |replay|bytes|Replay file of this run| ### Data Splits The dataset only contains a train split.  ## Dataset Creation The dataset was created over a little more than a month in Febuary 2022 using the self hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api). As requests made to Nintendo's servers require authentication the process had to be done with upmost care and limiting download speed as to not overload the API and risk a ban. There are no intentions to create an updated release of this dataset. ## Considerations for Using the Data The dataset contains no harmful language or depictions.

提供机构：

TheGreatRambler

原始信息汇总

数据集概述

基本信息

名称: Mario Maker 2 ninjis
语言: 多语言
许可证: CC-BY-NC-SA-4.0
多语言性: 多语言
大小: 1M<n<10M
源数据集: 原始数据
任务类别: 其他、目标检测、文本检索、令牌分类、文本生成
标签: 文本挖掘

数据集描述

内容: 包含300万ninji重播数据，总计约12.5GB。
收集时间: 2022年2月，历时约1个月。
数据来源: 使用自托管的Mario Maker 2 API收集。

数据结构

数据实例

python { data_id: 12171034, pid: 4748613890518923485, time: 83388, replay: [some binary data] }

数据字段

字段	类型	描述
data_id	int	关卡的数据ID
pid	string	玩家ID
time	int	运行时长，单位为毫秒
replay	bytes	该运行的重播文件，为二进制数据

数据分割

分割: 仅包含训练集。

数据集使用

加载方式: 推荐使用datasets库的流式API加载数据。
示例代码: python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") print(next(iter(ds)))

数据集创建

创建时间: 2022年2月
创建方法: 使用自托管的Mario Maker 2 API，注意避免对API造成过载。
更新计划: 无更新计划。

使用注意事项

内容审查: 数据集不含有害语言或描绘。

5,000+

优质数据集

54 个

任务类型

进入经典数据集