MPOSE2021: a Dataset for Short-Time Pose-Based Human Action Recognition
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5506688
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition.
MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [11] on popular datasets for HAR, i.e. Weizmann [5], i3DPost [6], IXMAS [7], KTH [8], UTKinetic-Action3D (RGB only) [9] and UTD-MHAD (RGB only) [10], alongside original video datasets, i.e. ISLD and ISLD-Additional-Sequences [1]. Since these datasets have heterogenous action labels, each dataset labels are remapped to a common and homogeneous list of actions. Generated sequences have a number of frames between 20 and 30. Sequences are obtained by cutting the so-called Precursor videos (video from the above-mentioned datasets), with non-overlapping sliding windows. Frames where OpenPose/PoseNet cannot detect any subject are automatically discarded. Resulting samples contain one subject at the time, performing a fraction of a single action. Overall, MPOSE2021 contains 15429 samples, divided into 20 actions, performed by 100 subjects.
More information about the dataset can be found in the MPOSE2021 repository, also providing a user-friendly Python package to import and use the dataset by just running the command
pip install mpose
Data Structure
The repository contains 3 datasets for each pose extractor (namely 1, 2 and 3) which consist of the same data divided in different train/test splits. Each dataset contains X and y numpy arrays for both training and testing. X has the following shape:
(B, T, K, C)
where
B is the batch number;
T (= 30) is the duration of the sequences in frames (zero-padded in the case of shorter sequences);
K (= 17 for PoseNet and 25 for OpenPose) is the number of pose keypoints;
C (= 3) is the number of channels, comprehending 2D keypoint coordinates (x,y) in the original video reference frame and the keypoint confidence (p <= 1)
The .txt files specifying the metadata associated with the split samples are also included.
References
MPOSE2021 is part of a paper published by the Pattern Recognition Journal (Elsevier), and is intended for scientific research purposes. If you want to use MPOSE2021 for your research work, please also cite [1-11].
@article{mazzia2021action,
title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition},
author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello},
journal={Pattern Recognition},
pages={108487},
year={2021},
publisher={Elsevier}
}
[1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.
[2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE.
[3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.
[4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.
[5] Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253.
[6] Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31.
[7] Weinland, D., Özuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg.
[8] Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE.
[9] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE.
[10] Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE.
[11] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286).
本仓库包含用于短时基于姿态的人类动作识别(Human Action Recognition, HAR)的MPOSE2021数据集。MPOSE2021专为短时人类动作识别任务设计。
MPOSE2021是MPOSE数据集[1-3]的演进版本。该数据集的人类姿态数据由OpenPose[4]与PoseNet[11]在多款经典人类动作识别数据集上检测得到,这些数据集包括Weizmann[5]、i3DPost[6]、IXMAS[7]、KTH[8]、UTKinetic-Action3D(仅RGB数据)[9]以及UTD-MHAD(仅RGB数据)[10],同时还包含原始视频数据集ISLD与ISLD-Additional-Sequences[1]。由于上述数据集的动作标签存在异质性,我们将各数据集的标签统一映射至一套通用且标准化的动作列表。生成的动作序列帧长介于20至30之间,通过对前述前驱视频(即上述数据集的原始视频)采用非重叠滑动窗口裁剪得到。对于OpenPose或PoseNet无法检测到人体的帧,将自动舍弃。最终得到的样本单次仅包含一名受试者,且对应单个动作的片段。总体而言,MPOSE2021数据集共包含15429个样本,涵盖20类动作,由100名受试者完成。
关于该数据集的更多信息可查阅MPOSE2021官方仓库,其中还提供了一款易用的Python工具包,仅需运行以下命令即可导入并使用该数据集:
pip install mpose
数据结构
本仓库针对每一类姿态提取器(即编号1、2、3)各提供3个数据集,这些数据集基于同一套数据,采用不同的训练/测试划分方式。每个数据集均包含训练集与测试集的X和y两个NumPy数组。X的形状如下:
(B, T, K, C)
其中:
- B:批次数量;
- T(固定为30):序列的帧时长,对于帧长不足的序列将采用零填充补齐;
- K:姿态关键点数量(PoseNet对应17个,OpenPose对应25个);
- C:通道数量(固定为3),分别对应原始视频参考坐标系下的二维关键点坐标(x,y)以及关键点置信度p(p≤1)。
仓库中同时包含用于描述划分样本元数据的.txt文件。
参考文献
MPOSE2021是发表于《Pattern Recognition》(爱思唯尔(Elsevier)旗下)期刊的论文的配套数据集,仅用于科学研究。若您在研究工作中使用MPOSE2021,请同时引用[1-11]。
@article{mazzia2021action,
title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition},
author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello},
journal={Pattern Recognition},
pages={108487},
year={2021},
publisher={Elsevier}
}
[1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.
[2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019) (pp. 8444-8448). IEEE.
[3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.
[4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.
[5] Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as Space-Time Shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12), 2247-2253.
[6] Starck, J., & Hilton, A. (2007). Surface Capture for Performance-Based Animation. IEEE computer graphics and applications, 27(3), 21-31.
[7] Weinland, D., Özuysal, M., & Fua, P. (2010, September). Making Action Recognition Robust to Occlusions and Viewpoint Changes. In European Conference on Computer Vision (pp. 635-648). Springer, Berlin, Heidelberg.
[8] Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing Human Actions: a Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 3, pp. 32-36). IEEE.
[9] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View Invariant Human Action Recognition using Histograms of 3D Joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20-27). IEEE.
[10] Chen, C., Jafari, R., & Kehtarnavaz, N. (2015, September). UTD-MHAD: A Multimodal Dataset for Human Action Recognition utilizing a Depth Camera and a Wearable Inertial Sensor. In 2015 IEEE International conference on image processing (ICIP) (pp. 168-172). IEEE.
[11] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286).
创建时间:
2023-01-23



