five

dunnolab/so-combined-eng

收藏
Hugging Face2025-11-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/dunnolab/so-combined-eng
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - robotics tags: - LeRobot configs: - config_name: default data_files: data/*/*.parquet language: - en size_categories: - 1M<n<10M --- This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description The English version of this dataset integrates **598 open-source community datasets** into a single unified corpus, comprising **22,709 episodes** and approximately **9.4 million frames** across **563 distinct tasks**. Several transformations were applied to ensure standardization and data quality: 1. **Camera view normalization** Because community datasets do not follow a consistent naming scheme for camera viewpoints, we used the `Qwen3-VL-8B-Instruct` model to categorize all images into one of three groups: **TOP**, **GRIPPER**, or **SIDE**. All datasets include TOP and GRIPPER viewpoints. Datasets lacking a SIDE viewpoint were padded with a zero-valued image. 2. **Task re-annotation** We used `Qwen3-VL-8B-Instruct` to refine task annotations where necessary. The re-labeling process considered both the video inputs and the original task descriptions when available. For the Russian version of the dataset, all task descriptions were additionally translated into Russian. 3. **Video standardization** To ensure reliable merging, all videos were re-encoded using the same codec (**H.264**), frame rate (**30 FPS**), and resolution (**480×640**). 4. **No-op removal** We applied episodic-level trimming to remove no-op segments at the beginning and end of episodes and removed episodes composed entirely of no-op actions. Corresponding video and parquet files were cropped accordingly. This process eliminated **12.7%** of no-op data from the raw dataset. - **License:** Apache-2.0 ## Community Contributors Included We acknowledge all community contributors whose datasets served as sources for this repository: - 00ri - 1zzx23 - 356c - AndrejOrsula - Askel1419 - BobBobbson - CSCSXX - CnLori - Congying1112 - DGEs - Daiki127 - Dangvi - DanqingZ - DorayakiLin - EGLima - Evelynix - EverNorif - Gano007 - Haribot099 - HarrisonLee24 - Hennadiy - Jiangeng - Kazu1232 - KeerthanKrish - Killian74 - Kimz1 - LeRobot-worldwide-hackathon - LemonadeDai - LightwheelAI - LittleFire99 - Loki0929 - Mazytomo - Micksavioz - Mwuqiu - NeilKim - Odog16 - Pi-robot - Qiushuang - RASPIAUDIO - RaulSaya - Rayenghali - ReubenLim - Revilo7 - RickRain - Rorschach4153 - SahilChande - SeanLMH - SharkDan - ShockleyWong - Stevenyu8888 - SurajChess - Thorns07 - Trelis - TzuShian - UN-kk - VoicAndrei - Xiewei1211 - YSanYi - Yanis7777 - ZGGZZG - Zak-Y - ZibinDong - aaron-ser - aaronsu11 - abhiloiwal2 - abhisb - abokinala - acyanbird - aiden-li - alexis779 - allenchienxxx - amrltqt - andy309 - apayan - aractingi - arulloomba - avea-robotics - badwolf256 - bap13 - bensprenger - boyangs235 - brcg3 - budinggou - cHemingway - cezarsolo - cjlqwe - cyoung96 - danaaubakirova - davidgoss - dc2ac - demon-zozo - desroziers - dleon23 - dongseon - drjaisree25 - dsfsg - duthvik - easonjcc - edgarkim - emmanuel-v - enpeicv - fbeltrao - francescocrivelli - frk2 - ganondorofu - gmm0820 - guanfengliu - gxy1111 - haijunsu-osu - hannb - hoon-shin - howld - hrhraj - huyouare - jchun - jcsux - jiajun001 - jlesein - jmendoza-10 - jpizarrom - juni3227 - jyang-ca - k1000dai - kagyvro48 - kaiserbuffle - kaiyuwu010 - karimnihal - kivod - kkurzweil - kristaqp - legion1581 - leolin6 - lerobot - lerobot-edinburgh-white-team - liamlau - lijinghai - lime66 - littledragon - liyitenga - ljw1105 - love3165303 - lucasfv - luriss - maitereo - masakinoda - masato-ka - mathieutk - nbirukov - northhycao - nuoyihan - omkarmayekar555 - opan08 - oretti - orsoromeo - pandaRQ - paultr - pbvr - pdd46465 - pr0tos - pranavsaroha - psavnani5 - ptizzza - puneetpanwar - reeced - ricky0526 - roboticshack - rowb1 - rs545837 - ryanpennings - s-higurashi - samanthalhy - samsam0510 - samsitol - seonixx - seunghoney - shylee - slowturtle99 - sshh11 - strainflow - suessmann - sunq - szfforever - taiobot - targabor - tfoldi - therarelab - thimble - tinkhireeva - tkc79 - tlf123 - tobdeu - triton7777 - un1c0rnio - uuysi - vednot25t - wangranryan - weblucas - weiye11 - wvangils - y1y2y3 - yingliu-data - yinxinyuchen - yo-michi22 - youliangtan - yuk6ra - yunhezhui123 - yuto083 - yuz1wan - zacapa - zaringleb - zheng6677 - zlj666 - zonglin1104 ## Dataset Structure [meta/info.json](meta/info.json): ```json { "codebase_version": "v2.1", "robot_type": "so100", "total_episodes": 22709, "total_frames": 9443507, "total_tasks": 563, "total_videos": 68127, "total_chunks": 23, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:22709" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path": "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4", "features": { "action": { "dtype": "float32", "shape": [ 6 ], "names": [ "main_shoulder_pan", "main_shoulder_lift", "main_elbow_flex", "main_wrist_flex", "main_wrist_roll", "main_gripper" ] }, "observation.state": { "dtype": "float32", "shape": [ 6 ], "names": [ "main_shoulder_pan", "main_shoulder_lift", "main_elbow_flex", "main_wrist_flex", "main_wrist_roll", "main_gripper" ] }, "observation.images.gripper": { "dtype": "video", "shape": [ 480, 640, 3 ], "names": [ "height", "width", "channels" ], "info": { "video.height": 480, "video.width": 640, "video.codec": "h264", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "video.fps": 30, "video.channels": 3, "has_audio": false } }, "observation.images.top": { "dtype": "video", "shape": [ 480, 640, 3 ], "names": [ "height", "width", "channels" ], "info": { "video.height": 480, "video.width": 640, "video.codec": "h264", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "video.fps": 30, "video.channels": 3, "has_audio": false } }, "observation.images.side": { "dtype": "video", "shape": [ 480, 640, 3 ], "names": [ "height", "width", "channels" ], "info": { "video.height": 480, "video.width": 640, "video.codec": "h264", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "video.fps": 30, "video.channels": 3, "has_audio": false } }, "timestamp": { "dtype": "float32", "shape": [ 1 ], "names": null }, "frame_index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "episode_index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "task_index": { "dtype": "int64", "shape": [ 1 ], "names": null } }, "repo_id": "dunno/merged" } ```
提供机构:
dunnolab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作