AILabDsUnipi/SafeQIL-dataset

Name: AILabDsUnipi/SafeQIL-dataset
Creator: AILabDsUnipi
Published: 2026-03-17 20:38:13
License: 暂无描述

Hugging Face2026-03-17 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AILabDsUnipi/SafeQIL-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit viewer: false task_categories: - reinforcement-learning tags: - inverse-constrained-reinforcement-learning - safe-rl - q-learning - offline-rl - demonstrations size_categories: - 100K<n<1M --- # Human-Generated Demonstrations for Safe Reinforcement Learning **Paper:** [Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective](https://arxiv.org/abs/2602.23816) **Code:** [AILabDsUnipi/SafeQIL](https://github.com/AILabDsUnipi/SafeQIL) ## Dataset Description This dataset consists of human-generated demonstrations collected across four challenging constrained environments from the Safety-Gymnasium benchmark (`SafetyPointGoal1-v0`, `SafetyCarPush2-v0`, `SafetyPointCircle2-v0`, and `SafetyCarButton1-v0`). It is designed to train agents with **SafeQIL** (Safe Q Inverse Constrained Reinforcement Learning) to maximize the likelihood of safe trajectories in Constrained Markov Decision Processes (CMDPs) where constraints are unknown and costs are non-observable. For every step in a demonstrated trajectory, we record the full transition dynamics. Each transition is captured as a tuple containing: * `vector_obs`: The proprioceptive/kinematic state of the agent. * `vision_obs`: The pixel-based visual observation. * `action`: The continuous control action taken by the human demonstrator. * `reward`: The standard task reward received. * `done`: The boolean flag indicating episode termination. To ensure efficient data loading and facilitate qualitative analysis, the data is distributed across three file types: * **`.h5` (HDF5):** Stores the core transition tuples. * **`.mp4`:** Provides rendered video rollouts of the expert's behavior for visual inspection. * **`.txt`:** Contains summary statistics and metadata for each dataset split. ## Dataset Structure The dataset is organized hierarchically by environment and dataset size. ```text / ├── README.md <- This dataset card ├── SafetyPointGoal1-v0/ │ ├── x1/ │ │ ├── stats.txt <- Dataset statistics │ │ ├── 0.h5 <- Human generated trajectory data │ │ ├── 0.mp4 <- Rendered trajectory │ │ ├── 1.h5 │ │ ├── 1.mp4 │ │ ├── 2.h5 │ │ ├── 2.mp4 │ │ ... │ │ ├── 39.h5 │ │ └── 39.mp4 │ ├── x2/ │ │ ├── stats.txt │ │ ├── 0.h5 │ │ ... │ │ └── 79.h5 │ ├── x4/ │ │ ├── stats.txt │ │ ├── 0.h5 │ │ ... │ │ └── 159.h5 │ ├── x8/ │ │ ├── stats.txt │ │ ├── 0.h5 │ │ ... │ │ └── 319.h5 ├── SafetyCarPush2-v0/ │ ├── x1/ │ │ ... │ │ x8/ ├── ... ``` Note that `SafetyCarButton1-v0` has only `x1` dataset. Also, note that only `x1` datasets contain video examples. ## How to Use This Dataset While the dataset is a manageable ~50GB, we recommend using the `huggingface_hub` Python library to selectively download subsets of the data (e.g., a specific environment or size multiplier) to save bandwidth. ```python from huggingface_hub import snapshot_download # Example: Download only the 'x1' dataset for SafetyPointGoal1-v0 snapshot_download( repo_id="AILabDsUnipi/SafeQIL-dataset", repo_type="dataset", allow_patterns="SafetyPointGoal1-v0/x1/*", local_dir="./demonstrations/SafetyPointGoal1-v0/x1/" ) ``` ### Loading HDF5 Files You can load the human-generated tuples directly using `h5py`. Note that the data inside each file is nested under a group named after the episode (e.g., for the file `0.h5` the group name is `episode_0`, for the file `1.h5` it is `episode_1`, etc). You can dynamically grab this group name in Python to load the data: ```python import h5py file_path = './local_data/SafetyPointGoal1-v0/x1/0.h5' with h5py.File(file_path, 'r') as f: # Load the arrays vector_obs = f['episode_0']['vector_obs'][:] vision_obs = f['episode_0']['vision_obs'][:] actions = f['episode_0']['actions'][:] reward = f['episode_0']['reward'][:] done = f['episode_0']['done'][:] ``` ## Citation ```bibtex @misc{papadopoulos2026learningmaintainsafetyexpert, title={Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective}, author={George Papadopoulos and George A. Vouros}, year={2026}, eprint={2602.23816}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2602.23816}, } ```

提供机构：

AILabDsUnipi

5,000+

优质数据集

54 个

任务类型

进入经典数据集