EgoNRG: An Egocentric Multi-View Hand-Arm Segmentation and Classification Dataset for Real-World HRI in Military and Industrial Settings
收藏DataCite Commons2026-03-30 更新2026-05-05 收录
下载链接:
https://dataverse.tdl.org/citation?persistentId=doi:10.18738/T8/DC4J0Q
下载链接
链接失效反馈官方服务:
资源简介:
<h1>Introduction</h1>
The Egocentric Navigation Robot Gestures (EgoNRG) dataset is an egocentric hand gesture dataset designed to improved Human-Robot Interactions (HRI) in real-world industry, military, and first response applications. It contains 3,000 classified gesture videos and 160,000 pixel-based segmented images captured from 32 different participants. The participants were captured performing 11 non-verbal gestures adopted from the Army Field Manual and 1 generic, deictic, pointing gesture referencing abstract objects in indoor and outdoor environments.
<br>
<br>
Highlights:
<ul>
<li>Joint hand and arm segmentations of each participants' left and right limb.</li>
<li>Participants' performed gestures with 1) <em>long sleeves and gloves</em> (wearing replica flame-resistant solid color clothing and military camouflage) and 2) <em>bare skin</em> to mimic conditions in real-world industrial and military environments.</li>
<li>Environments with and without background people visible.</li>
<li>Data captured in both indoor and outdoor environment at various points throughout the day (morning, midday, and dusk).</li>
<li>Data captured from four synchronized monochrome cameras each with a different perspective.</li>
<li>Gesture performed map directly to standard ground vehicle robot commands (stop, move forward, go left, move in reverse, etc.).</li>
</ul>
<img src="https://dataverse.tdl.org/api/access/datafile/760094"
alt=”egonrg_dataset_details">
<h2>Content</h2>
The dataset contains:
<ul>
<li>Videos from 32 participants (14 females / 18 males) in total performing 12 gestures in total. Participants were split into 4 groups of 8. Each group performed a set of 4 gestures.</li>
<li> 3,044 (~2.5 hours) videos in total annotated with gesture type. Each gesture performed by each participant has four different recorded synchronized viewpoints associated with that gesture.</li>
<li> 160,639 annotated frames with "Left Limb" and "Right Limb" pixel-based segmentations. The hands and arms of the participants were segmented together to create a joint segmentation for each respective limb.</li>
</ul>
<h2>Collection Method</h2>
The dataset was collected using the 4 VLC Monochrome cameras attached to the Microsoft HoloLens 2 headset. Each video stream provides an egocentric view of the participants hands and arms performing a wide variety of gestures from different perspectives. The perspectives include a wide left, central left, central right, and wide right camera that allows for detailed visual information of the gestures being performed across multiple cameras from multiple viewpoints. The headset streamed the video data to a remote server where the recorded data was synchronized and saved. Research assistants started and stopped the recording locally on board the headset via remote scripts. Three research assistants in total were tasked with the collection of the data over two months.
<h2>Annotations</h2>
The data was manually annotated by nine researchers. Three classes were assigned to each image: left limb, right limb, and background. Human annotators were instructed to annotate each limb as the joint hand and arm for all images they could tell the hand/arm of the participant was in the image. There were three steps to the annotation pipeline. The first step for the human annotators was to review left limb and right limb bounding boxes that were automatically generated using text prompts with GroundingDINO. Once the bounding boxes for each frame were varied, these images were then automatically segmented via Segment Anything 2 (SAM2) and reassembled into videos. These videos were then manually reviewed by the annotators with a tool that played the videos back at 1 FPS and the option to manually skip through the frames of the video. For each frame in the video that had incorrect pixel segmentations, annotators flagged these frames. Annotators then manually reviewed and fixed the pixel segmentations of the frames that were flagged. Each frame’s annotation was converted to a single PNG file, where the three classes were recorded: left hand, right hand, and background.
<br>
<br>
Example of Pixel Segmentation Annotations:
<img src="https://dataverse.tdl.org/api/access/datafile/760097"
alt=”egonrg_dataset_annotation_example">
<h2>Evaluation</h2>
Multiple semantic segmentation and gesture classification models were trained on the dataset. The official model training code and configurations for this dataset are on GitHub. The link to the public GitHub repository is provided in the Software metadata field below.
<h2>Human Subjects</h2>
This study was approved by the University of Texas at Austin Institutional Review Board (IRB) under the IRB ID: STUDY00000278-MOD10. To provide a comprehensive representation of collaborative scenarios, a diverse pool of participants was selected. Anyone who revoked their consent and expressed so was noted and removed from the data and the annotations.
<h2>Dataset Organization</h2>
The dataset is organized in the following format. It is recommended users first inspect the metadata under the metadata directory to understand which files should be used for their task. For an in-depth explanation of the dataset file structure, refer to the Dataset Report included in this dataset.
<br>
<img src="https://dataverse.tdl.org/api/access/datafile/760095"
alt=”egonrg_dataset_structure">
<h2>Dataset Quality Statement</h2>
<p>The research team maintained high data quality by adhering to standardized procedures established at the start of dataset collection and throughout the process, ensuring consistency across all participants. All data was ethically sourced using approved protocols that prioritize participant welfare and informed consent. Comprehensive documentation was maintained during data collection to ensure traceability and facilitate auditing. All dataset contents were thoroughly documented in this report and associated repositories, ensuring transparency and reproducibility.<p>
<h2>Further Information</h2>
More details could be found in the complete dataset report attached and linked below:
https://dataverse.tdl.org/api/access/datafile/760102
<h2>Download Dataset</h2>
<h3>1. Install Helper Script Dependencies</h3>
<ol>
<li>Create and activate a conda environment
<pre><code>conda create -n dataset-dl python==3.8</code></pre>
<pre><code>conda activate dataset-dl</code></pre>
</li>
<li>Install python dependencies
<pre><code>pip install pyDataverse pandas requests</code></pre>
</li>
</ol>
<h3>2. Setup TDR API KEY</h3>
<ol>
<li>Click on your name's drop down menu in the top right corner and select "API Token"</li>
<li>Generate and copy the API key.</li>
<li>In your terminal, create a TDR API key environment variable with the following command
<pre><code>export TDR_API_KEY=&lt;api_key&gt;</code></pre>
</li>
</ol>
<h3>3. Download and Run Helper Script</h3>
<ol>
<li>Create a base directory on your machine
<pre><code>mkdir EgoNRG && cd EgoNRG</code></pre>
</li>
<li>Download the python script from this TDR repo
<pre><code>wget --header="X-Dataverse-key: $TDR_API_KEY" -O "download_dataset.py" "https://dataverse.tdl.org/api/access/datafile/773700"</code></pre>
</li>
<li>Run the script
<pre><code>python3 download_dataset.py ['--all', '--vids', '--imgs', '--masks', '--anns']</code></pre>
</li>
</ol>
提供机构:
Texas Data Repository
创建时间:
2025-07-02



