Signing in the Wild dataset
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://ieee-dataport.org/documents/signing-wild-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Our Signing in the Wild dataset consists of various videos harvested from YouTube containing people signing in various sign languages and doing so in diverse settings, environments, under complex signer and camera motion, and even group signing. This dataset is intended to be used for sign language detection.For the negative set, we created two classes of videos, labelled‘speaking’ and ‘other’. Our motivation for the ‘speaking’ class isthat speech is often accompanied by hand gestures (gesticulation),which can be easily confused with signing. Signing can be discriminatedby its linguistic nature, i.e., its distinct phonological, morphologicaland categorical (discrete) structures, while gesticulationtends to be more spontaneous, idiosyncratic and analogue in nature.For the ‘other’ class, we looked for distractors to both ‘signing’and ‘speaking’, i.e., videos containing hand movements that arequite similar to signing/gesticulation and thus might confuse a classifier.Examples include: miming, hand exercises, various manualactivities like playing instruments, painting, writing, yoga and martialarts, sports like table tennis, etc. Also included are activitiessimilar to speech, like people laughing, clapping, nodding, listening to other speakers, etc.A total of 1120 videos are included in our dataset, each videocontributing the first 6.6 minutes, resulting in 2000 frames per videowhen sampled at 5Hz. We have 1.45 million video frames in total.Our videos are untrimmed, i.e., a video can contain multiple activities,background scenes, scene cuts, and other actions done by thesame or different actors. Thus the videos are unconstrained both spatiallyand also temporally. This is in line with recent trends in videoaction recognition [8], and unlike ASLR datasets where trimmedvideos are the norm. In particular, several videos in our dataset containall 3 classes (occasionally with temporal overlap), and sometimesthe same person alternating between signing and speaking.We performed manual groundtruthing at video frame level.Since action boundaries can be inherently fuzzy, we consider a shorttemporal context (10 frames) surrounding the frame to be labelledin order to decide on its class label. We also adopt certain spatialguidelines, e.g. mouth movements must be visible for action ‘speaking’,thus eliminating distant views and when the speaker turnshis/her back to the camera. Ambiguous cases are left unlabelled.We annotate video segments that do not contain signing or speaking as ‘other’, including opening/closing credits, title screens, scenetransitions, animations, background scenes, etc.If you find this dataset useful, please cite the following paper: Mark Borg, Kenneth P. Camilleri, "Sign Language Detection "In The Wild" With Recurrent Neural Networks", ICASSP 2019.Any comments, suggestions, feedback are welcome: mborg2005 at gmail dot com
创建时间:
2023-06-28
搜集汇总
数据集介绍

背景与挑战
背景概述
Signing in the Wild数据集是一个用于手语检测的视频数据集,包含从YouTube收集的1120个未修剪视频,总计145万帧,模拟真实世界中的复杂场景。数据集分为三类活动:手语、说话伴随手势以及其他手动活动,并提供帧级人工标注和预训练神经网络模型,适用于计算机视觉和信号处理研究。
以上内容由遇见数据集搜集并总结生成



