five

TheGreatRambler/mm2_ninji

收藏
Hugging Face2022-11-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TheGreatRambler/mm2_ninji
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - multilingual license: - cc-by-nc-sa-4.0 multilinguality: - multilingual size_categories: - 1M<n<10M source_datasets: - original task_categories: - other - object-detection - text-retrieval - token-classification - text-generation task_ids: [] pretty_name: Mario Maker 2 ninjis tags: - text-mining --- # Mario Maker 2 ninjis Part of the [Mario Maker 2 Dataset Collection](https://tgrcode.com/posts/mario_maker_2_datasets) ## Dataset Description The Mario Maker 2 ninjis dataset consists of 3 million ninji replays from Nintendo's online service totaling around 12.5GB of data. The dataset was created using the self-hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api) over the course of 1 month in February 2022. ### How to use it The Mario Maker 2 ninjis dataset is a very large dataset so for most use cases it is recommended to make use of the streaming API of `datasets`. You can load and iterate through the dataset with the following code: ```python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: { 'data_id': 12171034, 'pid': '4748613890518923485', 'time': 83388, 'replay': [some binary data] } ``` Each row is a ninji run in the level denoted by the `data_id` done by the player denoted by the `pid`, The length of this ninji run is `time` in milliseconds. `replay` is a gzip compressed binary file format describing the animation frames and coordinates of the player throughout the run. Parsing the replay is as follows: ```python from datasets import load_dataset import zlib import struct ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") row = next(iter(ds)) replay = zlib.decompress(row["replay"]) frames = struct.unpack(">I", replay[0x10:0x14])[0] character = replay[0x14] character_mapping = { 0: "Mario", 1: "Luigi", 2: "Toad", 3: "Toadette" } # player_state is between 0 and 14 and varies between gamestyles # as outlined below. Determining the gamestyle of a particular run # and rendering the level being played requires TheGreatRambler/mm2_ninji_level player_state_base = { 0: "Run/Walk", 1: "Jump", 2: "Swim", 3: "Climbing", 5: "Sliding", 7: "Dry bones shell", 8: "Clown car", 9: "Cloud", 10: "Boot", 11: "Walking cat" } player_state_nsmbu = { 4: "Sliding", 6: "Turnaround", 10: "Yoshi", 12: "Acorn suit", 13: "Propeller active", 14: "Propeller neutral" } player_state_sm3dw = { 4: "Sliding", 6: "Turnaround", 7: "Clear pipe", 8: "Cat down attack", 13: "Propeller active", 14: "Propeller neutral" } player_state_smb1 = { 4: "Link down slash", 5: "Crouching" } player_state_smw = { 10: "Yoshi", 12: "Cape" } print("Frames: %d\nCharacter: %s" % (frames, character_mapping[character])) current_offset = 0x3C # Ninji updates are reported every 4 frames for i in range((frames + 2) // 4): flags = replay[current_offset] >> 4 player_state = replay[current_offset] & 0x0F current_offset += 1 x = struct.unpack("<H", replay[current_offset:current_offset + 2])[0] current_offset += 2 y = struct.unpack("<H", replay[current_offset:current_offset + 2])[0] current_offset += 2 if flags & 0b00000110: unk1 = replay[current_offset] current_offset += 1 in_subworld = flags & 0b00001000 print("Frame %d:\n Flags: %s,\n Animation state: %d,\n X: %d,\n Y: %d,\n In subworld: %s" % (i, bin(flags), player_state, x, y, in_subworld)) #OUTPUT: Frames: 5006 Character: Mario Frame 0: Flags: 0b0, Animation state: 0, X: 2672, Y: 2288, In subworld: 0 Frame 1: Flags: 0b0, Animation state: 0, X: 2682, Y: 2288, In subworld: 0 Frame 2: Flags: 0b0, Animation state: 0, X: 2716, Y: 2288, In subworld: 0 ... Frame 1249: Flags: 0b0, Animation state: 1, X: 59095, Y: 3749, In subworld: 0 Frame 1250: Flags: 0b0, Animation state: 1, X: 59246, Y: 3797, In subworld: 0 Frame 1251: Flags: 0b0, Animation state: 1, X: 59402, Y: 3769, In subworld: 0 ``` You can also download the full dataset. Note that this will download ~12.5GB: ```python ds = load_dataset("TheGreatRambler/mm2_ninji", split="train") ``` ## Data Structure ### Data Instances ```python { 'data_id': 12171034, 'pid': '4748613890518923485', 'time': 83388, 'replay': [some binary data] } ``` ### Data Fields |Field|Type|Description| |---|---|---| |data_id|int|The data ID of the level this run occured in| |pid|string|Player ID of the player| |time|int|Length in milliseconds of the run| |replay|bytes|Replay file of this run| ### Data Splits The dataset only contains a train split. <!-- TODO create detailed statistics --> ## Dataset Creation The dataset was created over a little more than a month in Febuary 2022 using the self hosted [Mario Maker 2 api](https://tgrcode.com/posts/mario_maker_2_api). As requests made to Nintendo's servers require authentication the process had to be done with upmost care and limiting download speed as to not overload the API and risk a ban. There are no intentions to create an updated release of this dataset. ## Considerations for Using the Data The dataset contains no harmful language or depictions.
提供机构:
TheGreatRambler
原始信息汇总

数据集概述

基本信息

  • 名称: Mario Maker 2 ninjis
  • 语言: 多语言
  • 许可证: CC-BY-NC-SA-4.0
  • 多语言性: 多语言
  • 大小: 1M<n<10M
  • 源数据集: 原始数据
  • 任务类别: 其他、目标检测、文本检索、令牌分类、文本生成
  • 标签: 文本挖掘

数据集描述

  • 内容: 包含300万ninji重播数据,总计约12.5GB。
  • 收集时间: 2022年2月,历时约1个月。
  • 数据来源: 使用自托管的Mario Maker 2 API收集。

数据结构

数据实例

python { data_id: 12171034, pid: 4748613890518923485, time: 83388, replay: [some binary data] }

数据字段

字段 类型 描述
data_id int 关卡的数据ID
pid string 玩家ID
time int 运行时长,单位为毫秒
replay bytes 该运行的重播文件,为二进制数据

数据分割

  • 分割: 仅包含训练集。

数据集使用

  • 加载方式: 推荐使用datasets库的流式API加载数据。
  • 示例代码: python from datasets import load_dataset ds = load_dataset("TheGreatRambler/mm2_ninji", streaming=True, split="train") print(next(iter(ds)))

数据集创建

  • 创建时间: 2022年2月
  • 创建方法: 使用自托管的Mario Maker 2 API,注意避免对API造成过载。
  • 更新计划: 无更新计划。

使用注意事项

  • 内容审查: 数据集不含有害语言或描绘。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作