Japanese Sign Language Lip Landmark Dataset for Mouth Articulation Recognition
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Japanese_Sign_Language_Lip_Landmark_Dataset_for_Mouth_Articulation_Recognition/31711087
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains lip landmark coordinates extracted from video frames of Japanese Sign Language (JSL) news broadcasts. The source material consists of eight videos from NHK Sign Language News featuring ten signers. Frames containing a clearly visible signer were selected and processed to produce a dataset suitable for research on mouth articulation recognition and landmark-based modeling of facial motion.
From the original videos, frames were extracted at approximately 23.976 frames per second and filtered to retain only images containing a single frontal or near-frontal signer. After preprocessing and annotation, the dataset contains 6, 133 labeled frames belonging to six classes: a non-vowel class and five vowel articulation classes (A, I, U, E, O). As expected for continuous sign language recordings, the class distribution is imbalanced, with the non-vowel class occurring most frequently.
For each retained frame, facial landmarks were detected using MediaPipe with refined face landmarks enabled. The nose tip landmark (MediaPipe index 1) is used as a spatial anchor to define a fixed crop of 112 × 96 pixels around the mouth region. The crop is aligned so that the nose appears at a consistent location within the image. When the crop extends beyond the frame boundaries, the image is padded to preserve alignment.
A predefined subset of lip-related landmarks is then projected into crop coordinates and stored relative to the nose position. Landmark coordinates are normalized by the crop width and height, producing a representation that reduces sensitivity to global translation and scale differences between subjects.
The processed dataset includes cropped mouth-region images and CSV files containing normalized landmark coordinates for each frame.
CSV Landmark Files
Landmark data are stored in CSV files organized by participant. Each row corresponds to a single frame and contains the normalized coordinates of the selected lip landmarks.Coordinates are expressed relative to the nose anchor and normalized by crop dimensions.
Example structure:frame_name,0_x,0_y,1_x,1_y,...N_x,N_y where:
frame_name – filename identifying the frame, following the pattern frame__class_id – two-digit class identifier in frame_name [00: Non-Vowel; 01: A; 02: I; 03: U; 04: E; 05: O]frame_order – four-digit number in frame_name representing the temporal order of the frame within the video sequence.i_x, i_y – normalized coordinates of lip landmark iExample: frame_04_0123 means:
04 → class label (e.g., vowel class)0123 → frame index in the temporal sequence.Each CSV file contains all processed frames for a given participant.
Use Cases and Intended AudienceThis dataset is primarily intended for:
Training and evaluating machine learning models for mouth articulation recognitionResearch on sign language mouthing analysisDevelopment of spatio-temporal graph neural networks for facial landmark sequencesBenchmarking landmark-based approaches versus RGB-based visual modelsStudying temporal dynamics of lip motionMultimodal sign language research combining hand gestures and mouth articulationsThe dataset may also be used for research on:
visual speech recognitionlip readingfacial motion modelinghuman-computer interactionCitation guidelinesIf you use this dataset, please cite:1. The dataset: Nurzhigit Ongalov, Bogdan Kwolek (2026). Japanese Sign Language Lip Landmark Dataset for Mouth Articulation Recognition. Figshare. https://doi.org/10.6084/m9.figshare.31711087 2. The associated publication:Umeda, Y., Ongalov, N., Sroka, G., Sako, S., & Kwolek, B. (2025). Continuous Recognition of Mouth Patterns in Japanese Sign Language for Visual Communication. In: Intelligent Information and Database Systems, pp. 115–128. Springer Nature Singapore.
创建时间:
2026-03-16



