ActiveHuman Part 1
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8359765
下载链接
链接失效反馈官方服务:
资源简介:
This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
Folder configuration
The dataset consists of 3 folders:
JSON Data: Contains all the generated JSON files.
RGB Images: Contains the generated RGB images.
Semantic Segmentation Images: Contains the generated semantic segmentation images.
Essential Terminology
Annotation: Recorded data describing a single capture.
Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.).
Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor).
Ego coordinate system: Coordinates with respect to the ego.
Global coordinate system: Coordinates with respect to the global origin in Unity.
Sensor: Device that captures the dataset (in this instance the sensor is a camera).
Sensor coordinate system: Coordinates with respect to the sensor.
Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital.
UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.
Dataset Data
The dataset includes 4 types of JSON annotation files files:
annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
id: Integer identifier of the annotation's definition.
name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation).
description: Description of the annotation's specifications.
format: Format of the file containing the annotation specifications (e.g., json, PNG).
spec: Format-specific specifications for the annotation values generated by each Labeler.
Most Labelers generate different annotation specifications in the spec key-value pair:
BoundingBox2DLabeler/BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
KeypointLabeler:
template_id: Keypoint template UUID.
template_name: Name of the keypoint template.
key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
label: Joint label.
index: Joint index.
color: RGBA values of the keypoint.
color_code: Hex color code of the keypoint
skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
label1: Label of the first joint.
label2: Label of the second joint.
joint1: Index of the first joint.
joint2: Index of the second joint.
color: RGBA values of the connection.
color_code: Hex color code of the connection.
SemanticSegmentationLabeler:
label_name: String identifier of a label.
pixel_value: RGBA values of the label.
color_code: Hex color code of the label.
captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
id: UUID of the capture.
sequence_id: UUID of the sequence.
step: Index of the capture within a sequence.
timestamp: Timestamp (in ms) since the beginning of a sequence.
sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
sensor_id: Sensor UUID.
ego_id: Ego UUID.
modality: Modality of the sensor (e.g., camera, radar).
translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system.
camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration.
projection: Projection type used by the camera (e.g., orthographic, perspective).
ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
ego_id: Ego UUID.
translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable containing the ego's orientation.
velocity: 3D vector containing the ego's velocity (in meters per second).
acceleration: 3D vector containing the ego's acceleration (in ).
format: Format of the file captured by the sensor (e.g., PNG, JPG).
annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
id: Annotation UUID .
annotation_definition: Integer identifier of the annotation's definition.
filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image.
values: List of key-value pairs containing annotation data for the current Labeler.
Each Labeler generates different annotation specifications in the values key-value pair:
BoundingBox2DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
x: Position of the 2D bounding box on the X axis.
y: Position of the 2D bounding box position on the Y axis.
width: Width of the 2D bounding box.
height: Height of the 2D bounding box.
BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters).
size: 3D vector containing the size of the 3D bounding box (in meters)
rotation: Quaternion variable containing the orientation of the 3D bounding box.
velocity: 3D vector containing the velocity of the 3D bounding box (in meters per second).
acceleration: 3D vector containing the acceleration of the 3D bounding box acceleration (in ).
KeypointLabeler:
label_id: Integer identifier of a label.
instance_id: UUID of one instance of a joint. Keypoints with the same joint label that are visible on the same capture have different instance_id values.
template_id: UUID of the keypoint template.
pose: Pose label for that particular capture.
keypoints: Array containing the properties of each keypoint. Each keypoint that exists in the keypoint template file is one element of the array. Each entry's contents have as follows:
index: Index of the keypoint in the keypoint template file.
x: Pixel coordinates of the keypoint on the X axis.
y: Pixel coordinates of the keypoint on the Y axis.
state: State of the keypoint.
The SemanticSegmentationLabeler does not contain a values list.
egos.json: Contains collections of key-value pairs for each ego. These include:
id: UUID of the ego.
description: Description of the ego.
sensors.json: Contains collections of key-value pairs for all sensors of the simulation. These include:
id: UUID of the sensor.
ego_id: UUID of the ego on which the sensor is attached.
modality: Modality of the sensor (e.g., camera, radar, sonar).
description: Description of the sensor (e.g., camera, radar).
Image names
The RGB and semantic segmentation images share the same image naming convention. However, the semantic segmentation images also contain the string Semantic_ at the beginning of their filenames.
Each RGB image is named "e_h_l_d_r.jpg", where:
e denotes the id of the environment.
h denotes the id of the person.
l denotes the id of the lighting condition.
d denotes the camera distance at which the image was captured.
r denotes the camera angle at which the image was captured.
本文件为ActiveHuman数据集的第1/2部分,第2部分可通过此处获取。
## 数据集说明
ActiveHuman数据集基于Unity感知工具包(Unity's Perception package)生成。
该数据集包含175428张RGB图像及其语义分割对应图像,采集场景涵盖8类环境、33个人体模型、4种光照条件、7档相机距离(1米至4米)以及36个相机拍摄角度(0至360度,间隔10度)。
由于部分相机距离与角度组合下,相机会与场景内其他物体发生碰撞,或超出环境边界,因此数据集并未包含所有可用的相机距离与角度组合,部分组合未被纳入。
每张RGB图像均配套生成2D边界框(2D Bounding Box)、3D边界框(3D Bounding Box)以及关键点(Keypoint)真值标注,通过标注器(Labelers)生成并以JSON格式数据集存储。标注器为用于捕获每张采集图像或帧的真值标注的脚本。关键点标注遵循COCO关键点标注模板(COCO keypoint annotation template)所定义的COCO格式,该模板由Unity感知工具包提供。
## 文件夹结构
数据集包含以下3个文件夹:
1. JSON数据(JSON Data):存储所有生成的JSON文件
2. RGB图像(RGB Images):存储生成的RGB图像
3. 语义分割图像(Semantic Segmentation Images):存储生成的语义分割图像
## 核心术语
1. 标注(Annotation):描述单次采集过程的记录数据
2. 采集帧(Capture):Unity传感器完成的一次完整渲染流程,将渲染结果存储为PNG、JPG等数据文件
3. 载体(Ego):搭载了一组传感器的物体或人物,例如若无人机搭载相机,则无人机为载体,相机为传感器
4. 载体坐标系(Ego coordinate system):以载体为基准的坐标系
5. 全局坐标系(Global coordinate system):以Unity中的全局原点为基准的坐标系
6. 传感器(Sensor):用于采集数据集的设备,本数据集中特指相机
7. 传感器坐标系(Sensor coordinate system):以传感器为基准的坐标系
8. 序列(Sequence):按时间顺序排列的采集帧集合,适用于视频采集场景,此时采集帧的时间顺序关系至关重要
9. 通用唯一标识符(UUID, Universal Unique Identifier):一种唯一的十六进制标识符,可用于标识采集、载体、传感器、标注、标注对象或关键点模板等单个实例
## 数据集文件说明
数据集包含4类JSON标注文件:
1. `annotation_definitions.json`:以数组形式存储仿真中所有活跃标注器的标注定义。每个数组条目由一组键值对构成,用于描述某一类标注,并包含该类标注的相关信息,说明其数据应如何映射回场景中的标签或对象。每个条目包含以下键值对:
- `"id"`:标注定义的整数标识符
- `"name"`:标注名称,例如关键点、边界框、3D边界框、语义分割
- `"description"`:标注规格说明
- `"format"`:存储标注规格的文件格式,例如json、PNG
- `"spec"`:由各标注器生成的、与格式相关的标注值规格
不同标注器在`"spec"`字段中生成不同的标注规格:
- 2D边界框标注器(BoundingBox2DLabeler)/3D边界框标注器(BoundingBox3DLabeler):
- `"label_id"`:标签的整数标识符
- `"label_name"`:标签的字符串标识符
- 关键点标注器(KeypointLabeler):
- `"template_id"`:关键点模板的通用唯一标识符
- `"template_name"`:关键点模板的名称
- `"key_points"`:包含关键点模板定义的所有关节的数组,该数组包含以下键值对:
- `"label"`:关节标签
- `"index"`:关节索引
- `"color"`:关键点的RGBA颜色值
- `"color_code"`:关键点的十六进制颜色码
- `"skeleton"`:包含关键点模板定义的所有骨骼连接的数组,每个骨骼连接定义两个不同关节之间的关联,该数组包含以下键值对:
- `"label1"`:第一个关节的标签
- `"label2"`:第二个关节的标签
- `"joint1"`:第一个关节的索引
- `"joint2"`:第二个关节的索引
- `"color"`:骨骼连接的RGBA颜色值
- `"color_code"`:骨骼连接的十六进制颜色码
- 语义分割标注器(SemanticSegmentationLabeler):
- `"label_name"`:标签的字符串标识符
- `"pixel_value"`:标签的RGBA颜色值
- `"color_code"`:标签的十六进制颜色码
2. `captures_xyz.json`:每个该文件均存储由各活跃标注器为单张采集帧单独生成的真值标注数组,以及描述场景中各活跃传感器状态的附加元数据。每个数组条目包含以下键值对:
- `"id"`:采集帧的通用唯一标识符
- `"sequence_id"`:序列的通用唯一标识符
- `"step"`:采集帧在序列中的索引
- `"timestamp"`:序列开始以来的时间戳(单位:毫秒)
- `"sensor"`:传感器属性,该条目包含以下键值对:
- `"sensor_id"`:传感器的通用唯一标识符
- `"ego_id"`:载体的通用唯一标识符
- `"modality"`:传感器的模态,例如相机、雷达
- `"translation"`:描述传感器相对于全局坐标系位置的三维向量(单位:米)
- `"rotation"`:描述传感器相对于载体坐标系朝向的四元数
- `"camera_intrinsic"`:(若存在)相机的内参标定矩阵
- `"projection"`:相机使用的投影类型,例如正交投影、透视投影
- `"ego"`:载体属性,该条目包含以下键值对:
- `"ego_id"`:载体的通用唯一标识符
- `"translation"`:描述载体相对于全局坐标系位置的三维向量(单位:米)
- `"rotation"`:描述载体朝向的四元数
- `"velocity"`:描述载体速度的三维向量(单位:米/秒)
- `"acceleration"`:描述载体加速度的三维向量(单位:米/秒²)
- `"format"`:传感器捕获的文件格式,例如PNG、JPG
- `"annotations"`:每个活跃标注器对应的键值对集合,包含以下键值对:
- `"id"`:标注的通用唯一标识符
- `"annotation_definition"`:标注定义的整数标识符
- `"filename"`:标注器生成的文件名,仅在标注器生成图像文件时存在
- `"values"`:包含当前标注器标注数据的键值对列表
不同标注器在`"values"`字段中生成不同的标注规格:
- 2D边界框标注器(BoundingBox2DLabeler):
- `"label_id"`:标签的整数标识符
- `"label_name"`:标签的字符串标识符
- `"instance_id"`:单个对象实例的通用唯一标识符,同一采集帧中同一标签的可见对象具有不同的instance_id
- `"x"`:2D边界框在X轴上的位置
- `"y"`:2D边界框在Y轴上的位置
- `"width"`:2D边界框的宽度
- `"height"`:2D边界框的高度
- 3D边界框标注器(BoundingBox3DLabeler):
- `"label_id"`:标签的整数标识符
- `"label_name"`:标签的字符串标识符
- `"instance_id"`:单个对象实例的通用唯一标识符,同一采集帧中同一标签的可见对象具有不同的instance_id
- `"translation"`:描述3D边界框中心相对于传感器坐标系位置的三维向量(单位:米)
- `"size"`:描述3D边界框尺寸的三维向量(单位:米)
- `"rotation"`:描述3D边界框朝向的四元数
- `"velocity"`:描述3D边界框速度的三维向量(单位:米/秒)
- `"acceleration"`:描述3D边界框加速度的三维向量(单位:米/秒²)
- 关键点标注器(KeypointLabeler):
- `"label_id"`:标签的整数标识符
- `"instance_id"`:单个关节实例的通用唯一标识符,同一采集帧中同一关节标签的可见关键点具有不同的instance_id
- `"template_id"`:关键点模板的通用唯一标识符
- `"pose"`:当前采集帧的姿态标签
- `"keypoints"`:包含每个关键点属性的数组,关键点模板文件中的每个关键点均为数组的一个元素,每个条目包含以下内容:
- `"index"`:关键点在关键点模板文件中的索引
- `"x"`:关键点在图像X轴上的像素坐标
- `"y"`:关键点在图像Y轴上的像素坐标
- `"state"`:关键点的状态
语义分割标注器(SemanticSegmentationLabeler)不包含`"values"`列表。
3. `egos.json`:包含每个载体的键值对集合,包含以下字段:
- `"id"`:载体的通用唯一标识符
- `"description"`:载体的描述信息
4. `sensors.json`:包含仿真中所有传感器的键值对集合,包含以下字段:
- `"id"`:传感器的通用唯一标识符
- `"ego_id"`:传感器所搭载载体的通用唯一标识符
- `"modality"`:传感器的模态,例如相机、雷达、声呐
- `"description"`:传感器的描述信息,例如相机、雷达
## 图像命名规则
RGB图像与语义分割图像采用相同的命名规则,但语义分割图像的文件名前缀会添加`Semantic_`字符串。
每张RGB图像的命名格式为"e_h_l_d_r.jpg",其中:
- `e`:环境的编号
- `h`:人体模型的编号
- `l`:光照条件的编号
- `d`:图像采集时的相机距离
- `r`:图像采集时的相机拍摄角度
创建时间:
2023-11-14



