five

hamza-adnan/visual_distracting_control_suite

收藏
Hugging Face2026-04-08 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/hamza-adnan/visual_distracting_control_suite
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: cheetah_run features: - name: observation dtype: image - name: state list: float32 length: 17 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: train num_bytes: 53270273758 num_examples: 9000000 - name: test num_bytes: 5918594227 num_examples: 1000000 download_size: 65071625266 dataset_size: 59188867985 - config_name: cheetah_run_distractor_hard features: - name: observation dtype: image - name: state list: float32 length: 17 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: train num_bytes: 74053289802 num_examples: 9000000 - name: test num_bytes: 8082638681 num_examples: 1000000 download_size: 82135334541 dataset_size: 82135928483 - config_name: cheetah_run_distractor_low features: - name: observation dtype: image - name: state list: float32 length: 17 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 8021400416 num_examples: 1000000 - name: train num_bytes: 70454187948 num_examples: 9000000 download_size: 86487076328 dataset_size: 78475588364 - config_name: hopper_hop features: - name: observation dtype: image - name: state list: float32 length: 15 - name: mask dtype: image - name: action list: float32 length: 4 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: train num_bytes: 51471971969 num_examples: 9000000 - name: test num_bytes: 5718832813 num_examples: 1000000 download_size: 62655370451 dataset_size: 57190804782 - config_name: hopper_hop_distractor_hard features: - name: observation dtype: image - name: state list: float32 length: 15 - name: mask dtype: image - name: action list: float32 length: 4 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 8249548536 num_examples: 1000000 - name: train num_bytes: 72097453824 num_examples: 9000000 download_size: 160329248812 dataset_size: 80347002360 - config_name: hopper_hop_distractor_low features: - name: observation dtype: image - name: state list: float32 length: 15 - name: mask dtype: image - name: action list: float32 length: 4 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 7839665057 num_examples: 1000000 - name: train num_bytes: 68870596536 num_examples: 9000000 download_size: 152982979619 dataset_size: 76710261593 - config_name: humanoid_walk features: - name: observation dtype: image - name: state list: float32 length: 67 - name: mask dtype: image - name: action list: float32 length: 21 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 5070235970 num_examples: 1000000 - name: train num_bytes: 45625807845 num_examples: 9000000 download_size: 111746246866 dataset_size: 50696043815 - config_name: humanoid_walk_distractor_hard features: - name: observation dtype: image - name: state list: float32 length: 67 - name: mask dtype: image - name: action list: float32 length: 21 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 7400537242 num_examples: 1000000 - name: train num_bytes: 65863139376 num_examples: 9000000 download_size: 79307163440 dataset_size: 73263676618 - config_name: humanoid_walk_distractor_low features: - name: observation dtype: image - name: state list: float32 length: 67 - name: mask dtype: image - name: action list: float32 length: 21 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 7296954770 num_examples: 1000000 - name: train num_bytes: 65122732432 num_examples: 9000000 download_size: 134273060565 dataset_size: 72419687202 - config_name: walker_run features: - name: observation dtype: image - name: state list: float32 length: 24 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 5733131603 num_examples: 1000000 - name: train num_bytes: 51588531732 num_examples: 9000000 download_size: 56856633613 dataset_size: 57321663335 - config_name: walker_run_distractor_hard features: - name: observation dtype: image - name: state list: float32 length: 24 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 7628712875 num_examples: 1000000 - name: train num_bytes: 65388586082 num_examples: 9000000 download_size: 73094420747 dataset_size: 73017298957 - config_name: walker_run_distractor_low features: - name: observation dtype: image - name: state list: float32 length: 24 - name: mask dtype: image - name: action list: float32 length: 6 - name: reward dtype: float32 - name: terminated dtype: bool - name: truncated dtype: bool splits: - name: test num_bytes: 6939287765 num_examples: 1000000 - name: train num_bytes: 62149910283 num_examples: 9000000 download_size: 137815025011 dataset_size: 69089198048 configs: - config_name: cheetah_run data_files: - split: train path: cheetah_run/train-* - split: test path: cheetah_run/test-* - config_name: cheetah_run_distractor_hard data_files: - split: test path: cheetah_run_distractor_hard/test-* - split: train path: cheetah_run_distractor_hard/train-* - config_name: cheetah_run_distractor_low data_files: - split: test path: cheetah_run_distractor_low/test-* - split: train path: cheetah_run_distractor_low/train-* - config_name: hopper_hop data_files: - split: train path: hopper_hop/train-* - split: test path: hopper_hop/test-* - config_name: hopper_hop_distractor_hard data_files: - split: train path: hopper_hop_distractor_hard/train-* - split: test path: hopper_hop_distractor_hard/test-* - config_name: hopper_hop_distractor_low data_files: - split: test path: hopper_hop_distractor_low/test-* - split: train path: hopper_hop_distractor_low/train-* - config_name: humanoid_walk data_files: - split: test path: humanoid_walk/test-* - split: train path: humanoid_walk/train-* - config_name: humanoid_walk_distractor_hard data_files: - split: train path: humanoid_walk_distractor_hard/train-* - split: test path: humanoid_walk_distractor_hard/test-* - config_name: humanoid_walk_distractor_low data_files: - split: test path: humanoid_walk_distractor_low/test-* - split: train path: humanoid_walk_distractor_low/train-* - config_name: walker_run data_files: - split: test path: walker_run/test-* - split: train path: walker_run/train-* - config_name: walker_run_distractor_hard data_files: - split: test path: walker_run_distractor_hard/test-* - split: train path: walker_run_distractor_hard/train-* - config_name: walker_run_distractor_low data_files: - split: test path: walker_run_distractor_low/test-* - split: train path: walker_run_distractor_low/train-* --- ## Visual Distracting Control Suite Benchmark This dataset contains expert trajectories generated by a Proximal Policy Optimization (PPO) reinforcement learning agent trained on 4 environments of the [Distracting Control Suite](https://github.com/google-research/google-research/tree/master/distracting_control). For each environment we collect data with different levels of distraction, which we define below, and masks for the agent. Levels of distraction: - None: Vanilla DeepMind Control Suite without visual distractions. The environment uses the default static background and maintains a fixed camera position. - Low: Dynamic background distractors only. - Background: Video frames from the DAVIS dataset are played sequentially. At each step, the background advances to the next frame, cycling through the video (direction reverses at endpoints). - Camera: Fixed position, no scale or rotation changes (scale=0.0). - Hard: Dynamic background and camera distractors. - Background: Same dynamic video background as "Low" level. - Camera: Dynamic camera during an episode, changes position, rotation, and zoom (scale=0.1). - Horizontal/vertical rotation delta: ±π/20 radians (±9°) - Roll rotation delta: ±π/20 radians (±9°) - Zoom range: 5% zoom in to 15% zoom out relative to default distance - Dynamic velocity: position velocity std=0.01, max velocity=0.04 - Dynamic roll: roll std=π/3000, max roll velocity=π/500 ## Dataset Usage Regular usage (for the domain acrobot with task swingup): ```python from datasets import load_dataset train_dataset = load_dataset("EpicPinkPenguin/visual_distracting_control_suite", name="cheetah_run_distractor_hard", split="train") test_dataset = load_dataset("EpicPinkPenguin/visual_distracting_control_suite", name="cheetah_run_distractor_hard", split="test") ``` ## Agent Performance The PPO agent was trained for 2M steps on each environment and obtained the following final performance metrics on the evaluation environment. | Environment | Steps (Train) | Steps (Test) | Return | Observation | |:------------------------------|:----------------|:---------------|:---------|:------------| | cheetah_run | 9,000,000 | 1,000,000 | 837.67 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/ADhRT1y4n6N7WSpVsjILC.mp4"></video> | | cheetah_run_distractor_low | 9,000,000 | 1,000,000 | 837.67 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/Lxghx15h2m3S4HUK30BXH.mp4"></video> | | cheetah_run_distractor_hard | 9,000,000 | 1,000,000 | 837.67 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/aN9ATW1Uj1k-2LMfUXFyz.mp4"></video> | | hopper_hop | 9,000,000 | 1,000,000 | 307.33 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/RfffHjzPVEY10-Us9mebw.mp4"></video> | | hopper_hop_distractor_low | 9,000,000 | 1,000,000 | 307.33 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/mcNPXejRubNhWimxtQCY6.mp4"></video> | | hopper_hop_distractor_hard | 9,000,000 | 1,000,000 | 307.33 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/lIafcEwu7cJRL3DjoQJn8.mp4"></video> | | humanoid_walk | 9,000,000 | 1,000,000 | 616.52 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/cgPfBliYZIVYl_wNlMgpw.mp4"></video> | | humanoid_walk_distractor_low | 9,000,000 | 1,000,000 | 616.52 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/AJvmdxUANxvcvTLVntGtJ.mp4"></video> | | humanoid_walk_distractor_hard | 9,000,000 | 1,000,000 | 616.52 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/2iInTmj_camNn7JwFI2ej.mp4"></video> | | walker_run | 9,000,000 | 1,000,000 | 738.37 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/Hy2yFZNUVt53OziilUtsN.mp4"></video> | | walker_run_distractor_low | 9,000,000 | 1,000,000 | 738.37 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/oz7k7b6-du2ZWqSI1IvWA.mp4"></video> | | walker_run_distractor_hard | 9,000,000 | 1,000,000 | 738.37 | <video controls autoplay loop src="https://cdn-uploads.huggingface.co/production/uploads/633c1daf31c06121a58f2df9/zuK1C9ZAnZ27CM0nAWMHr.mp4"></video> | ## Dataset Structure ### Data Instances Each data instance represents a single step consisting of tuples of the form (observation, state, mask, action, reward, done, truncated) = (o_t, s_t, m_t, a_t, r_t, terminated_t, truncated_t). ```json {'action': [1], 'observation': [[[0, 166, 253], [0, 174, 255], [0, 170, 251], [0, 191, 255], [0, 191, 255], [0, 221, 255], [0, 243, 255], [0, 248, 255], [0, 243, 255], [10, 239, 255], [25, 255, 255], [0, 241, 255], [0, 235, 255], [17, 240, 255], [10, 243, 255], [27, 253, 255], [39, 255, 255], [58, 255, 255], [85, 255, 255], [111, 255, 255], [135, 255, 255], [151, 255, 255], [173, 255, 255], ... [0, 0, 37], [0, 0, 39]]], 'state': [-0.09255199134349823, 0.028468089178204536, -0.05743644759058952, ..., -0.013366516679525375, -0.08739502727985382, 0.007727491203695536] 'mask' = [ [0, 0, 0, 0, ..., 0, 0, 0, 0], [0, 0, 0, 0, ..., 0, 0, 0, 0], [0, 0, 255, 255, ..., 255, 255, 0, 0], [0, 0, 255, 255, ..., 255, 255, 0, 0], ... [0, 0, 255, 255, ..., 255, 255, 0, 0], [0, 0, 255, 255, ..., 255, 255, 0, 0], [0, 0, 0, 0, ..., 0, 0, 0, 0], [0, 0, 0, 0, ..., 0, 0, 0, 0], ] 'reward': 0.0, 'terminated': False 'truncated': False} ``` ### Data Fields - `observation`: The current RGB observation from the environment. - `state`: The current state of the environment. - `mask`: A segmentation mask of the agent, with everything zero, except the agent, which is 255. - `action`: The action predicted by the agent for the current observation. - `reward`: The received reward for the current observation. - `terminated`: If the episode has terminated with the current observation. - `truncated`: If the episode is truncated with the current observation. ### Data Splits The dataset is divided into a `train` (90%) and `test` (10%) split. Each environment-dataset has in sum 10M steps (data points). ## Dataset Creation The dataset was created by training a PPO RL agent 2M steps in each environment. The trajectories where generated by taking a greedy action (mean) from the predicted action distribution at each step. The agent was trained on the state. Each environment was created with the same random seed, making the trajectories identical between the different distraction levels. This means concretely, that episode 0 of cheetah_run is identical with episode 0 of cheetah_run_distractor_low and cheetah_run_distractor_hard in everything, except the observation due to the visual distractors. This continues for the remaining episodes. ## Distracting Control Suite The [Distracting Control Suite](https://arxiv.org/abs/2101.02722) is an extension of the DeepMind Control Suite that augments standard continuous control tasks with visual distractions to evaluate the robustness of reinforcement learning (RL) algorithms. While preserving the underlying MuJoCo-based physics and task dynamics, it introduces changes in the visual observations—such as background videos, colors, textures, and camera variations—that are unrelated to the control objective. These distractions are designed to challenge agents’ ability to learn representations that generalize beyond spurious visual correlations. By decoupling task-relevant dynamics from high-dimensional, non-stationary visual noise, the Distracting Control Suite provides a controlled benchmark for studying generalization, representation learning, and robustness in vision-based RL. It is commonly used to assess how well algorithms trained in one visual setting transfer to others, and to compare methods that aim to improve invariance, stability, and sample efficiency under perceptual perturbations.
提供机构:
hamza-adnan
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作