Commonly used notations.
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Commonly_used_notations_/28419116
下载链接
链接失效反馈官方服务:
资源简介:
Sign language is a complex visual language system that uses hand gestures, facial expressions, and body movements to convey meaning. It is the primary means of communication for millions of deaf and hard-of-hearing individuals worldwide. Tracking physical actions, such as hand movements and arm orientation, alongside expressive actions, including facial expressions, mouth movements, eye movements, eyebrow gestures, head movements, and body postures, using only RGB features can be limiting due to discrepancies in backgrounds and signers across different datasets. Despite this limitation, most Sign Language Translation (SLT) research relies solely on RGB features. We used keypoint features, and RGB features to capture better the pose and configuration of body parts involved in sign language actions and complement the RGB features. Similarly, most works on SLT research have used transformers, which are good at capturing broader, high-level context and focusing on the most relevant video frames. Still, the inherent graph structure associated with sign language is neglected and fails to capture low-level details. To solve this, we used a joint encoding technique using a transformer and STGCN architecture to capture the context of sign language expressions and spatial and temporal dependencies on skeleton graphs. Our method, SignFormer-GCN, achieves competitive performance in RWTH-PHOENIX-2014T, How2Sign, and BornilDB v1.0 datasets experimentally, showcasing its effectiveness in enhancing translation accuracy through different sign languages. The code is available at the following link: https://github.com/rabeya-akter/SignLanguageTranslation.
手语是一套复杂的视觉语言体系,通过手部动作、面部表情与肢体运动传递语义,是全球数百万聋人与重听人群的主要沟通方式。若仅依靠RGB特征追踪手语中的肢体动作(如手部运动、手臂姿态)与各类表情动作(涵盖面部表情、嘴部动作、眼部运动、眉部神态、头部动作与肢体姿势),则会因不同数据集间背景环境与手语使用者的差异而存在局限性。尽管存在此类局限,当前绝大多数手语翻译(Sign Language Translation, SLT)相关研究仍仅依赖RGB特征。为此,我们同时使用关键点特征与RGB特征,以更精准地捕捉手语动作涉及的肢体部位姿态与结构,并对RGB特征进行补充。类似地,多数SLT研究均采用了Transformer模型——该模型擅长捕捉更广泛的高层上下文,并聚焦于最相关的视频帧——但却忽视了手语本身固有的图结构特性,无法捕获底层细节信息。为解决这一问题,我们采用了融合Transformer与时空图卷积网络(Spatial-Temporal Graph Convolutional Network, STGCN)架构的联合编码技术,以此捕获手语表达的上下文信息,以及骨架图层面的时空依赖关系。经实验验证,我们提出的SignFormer-GCN方法在RWTH-PHOENIX-2014T、How2Sign以及BornilDB v1.0这三个数据集上均取得了极具竞争力的性能,充分证明了该方法在提升多语种手语翻译精度方面的有效性。代码已开源,链接如下:https://github.com/rabeya-akter/SignLanguageTranslation.
创建时间:
2025-02-14



