GSL Dataset
收藏paperswithcode.com2025-01-21 收录
下载链接:
https://paperswithcode.com/dataset/gsl
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Description
The Greek Sign Language (GSL) is a large-scale RGB+D dataset, suitable for Sign Language Recognition (SLR) and Sign Language Translation (SLT). The video captures are conducted using an Intel RealSense D435 RGB+D camera at a rate of 30 fps. Both the RGB and the depth streams are acquired in the same spatial resolution of 848×480 pixels. To increase variability in the videos, the camera position and orientation is slightly altered within subsequent recordings. Seven different signers are employed to perform 5 individual and commonly met scenarios in different public services. The average length of each scenario is twenty sentences.
The dataset contains 10,290 sentence instances, 40,785 gloss instances, 310 unique glosses (vocabulary size) and 331 unique sentences, with 4.23 glosses per sentence on average. Each signer is asked to perform the pre-defined dialogues five consecutive times. In all cases, the simulation considers a deaf person communicating with a single public service employee. The involved signer performs the sequence of glosses of both agents in the discussion. For the annotation of each gloss sequence, GSL linguistic experts are involved. The given annotations are at individual gloss and gloss sequence level. A translation of the gloss sentences to spoken Greek is also provided.
Evaluation
The GSL dataset includes the 3 evaluation setups:
Signer-dependent continuous sign language recognition (GSL SD) – roughly 80% of videos are used for training, corresponding to 8,189 instances. The rest 1,063 (10%) were kept for validation and 1,043 (10%) for testing.
Signer-independent continuous sign language recognition (GSL SI) – the selected test gloss sequences are not used in the training set, while all the individual glosses exist in the training set. In GSL SI, the recordings of one signer are left out for validation and testing (588 and 881 instances, respectively). The rest 8821 instances are utilized for training.
Isolated gloss sign language recognition (GSL isol.) – The validation set consists of 2,231 gloss instances, the test set 3,500, while the remaining 34,995 are used for training. All 310 unique glosses are seen in the training set.
For more info and results, advice our paper
Paper Abstract: A Comprehensive Study on Sign Language Recognition Methods, Adaloglou et al. 2020
In this paper, a comparative experimental assessment of computer vision-based methods for sign language recognition is conducted. By implementing the most recent deep neural network methods in this field, a thorough evaluation on multiple publicly available datasets is performed. The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses. For this task, two new sequence training criteria, known from the fields of speech and scene text recognition, are introduced. Furthermore, a
plethora of pretraining schemes are thoroughly discussed. Finally, a new RGB+D dataset for the Greek sign language is created. To the best of our knowledge, this is the first sign language dataset where sentence and gloss level annotations are provided for every video capture.
Arxiv link
数据集描述
希腊手语(GSL)是一个大规模的RGB+D数据集,适用于手语识别(SLR)和手语翻译(SLT)。视频捕捉采用Intel RealSense D435 RGB+D摄像头进行,帧率为30 fps。RGB和深度流均以相同的空间分辨率848×480像素进行采集。为了增加视频的多样性,摄像头在后续录制中对位置和方向进行了轻微调整。数据集中包含了七位不同的手语者,他们分别执行了五种在公共服务中常见且独立的场景。每个场景的平均长度为二十个句子。
本数据集包含10,290个句子实例、40,785个词素实例、310个独特的词素(词汇量)和331个独特的句子,平均每个句子包含4.23个词素。每位手语者被要求连续五次执行预定义的对话。在所有情况下,模拟均考虑了一位聋人与单一公共服务员工进行沟通。涉及的手语者执行了讨论中双方代理的词素序列。对于每个词素序列的标注,均涉及GSL语言学专家。提供的标注既包括单个词素的标注,也包括词素序列的标注。同时,还提供了将词素句子翻译成口语希腊语的翻译。
评估
GSL数据集包括以下三个评估设置:
- 手语者依赖的连续手语识别(GSL SD)——约80%的视频用于训练,对应8,189个实例。其余1,063个实例(10%)用于验证,1,043个实例(10%)用于测试。
- 手语者独立连续手语识别(GSL SI)——选定的测试词素序列不包含在训练集中,而所有单个词素均存在于训练集中。在GSL SI中,一位手语者的录音被排除用于验证和测试(分别为588和881个实例)。其余8821个实例用于训练。
- 独立词素手语识别(GSL isol.)——验证集由2,231个词素实例组成,测试集由3,500个实例组成,其余34,995个实例用于训练。所有310个独特的词素均出现在训练集中。
更多信息和结果,请参阅我们的论文。
论文摘要:Adaloglou等人在2020年进行的一项关于手语识别方法的综合研究。在该研究中,对基于计算机视觉的手语识别方法进行了比较实验评估。通过在该领域实施最新的深度神经网络方法,对多个公开数据集进行了彻底评估。本研究旨在提供有关手语识别的见解,重点关注将非分段视频流映射到词素。为此任务,引入了来自语音和场景文本识别领域的两种新的序列训练标准。此外,还详细讨论了大量的预训练方案。最后,创建了一个新的用于希腊手语的RGB+D数据集。据我们所知,这是第一个为每个视频捕捉提供句子和词素级别标注的手语数据集。
Arxiv链接
提供机构:
Papers with Code



