five

Mediapi-RGB

收藏
DataCite Commons2026-02-10 更新2026-05-04 收录
下载链接:
https://www.ortolang.fr/market/item/mediapi-rgb/v1
下载链接
链接失效反馈
官方服务:
资源简介:
Mediapi-RGB is a bilingual corpus of French Sign Language (LSF) and written French in the form of subtitled videos, accompanied by complementary data (various representations, segmentation, vocabulary, etc.). It can be used in academic research for a wide range of tasks, such as training or evaluating sign language (SL) extraction, recognition or translation models.To build this corpus, we used videos from Média'Pi!, a bilingual online media with journalistic-type content in LSF with French subtitles. We collected 1230 videos dating from September 2017 to January 2022, representing a total of 86h. Based on the subtitles, we temporally segmented the videos into 50084 video segments (or extracts). We also automatically cropped the signer and harmonised the segments in terms of size (444x444) and frequency (25fps).Version 1 of the repository :In this first version, we provide the video segments of the test and val sets; those of the train set, currently under embargo, will be available in 2025. We are also providing four types of data: OpenPose (calculated from the 1230 original videos), Mediapipe Holistic, I3D and Video Swin Transformer (calculated from each of the 50084 video extracts (or segments) using models trained on British Sign Language). We also provide the French subtitles, as well as a list of lemmatised nouns (common and proper) appearing at least 5 times in the subtitles of the extracts. Finally, for each video segment, we provide predictions of the boundaries between signs.Content of the directories :data foldervideos_test.zip: 8060 processed videos used for testingvideos_val.zip: 4376 processed videos used for validationopenpose_zips.zip: one zip for each of the 1230 original video (before segmentation and cropping), containing the openpose keypoints for each framemediapipe1.zip, mediapipe2.zip, mediapipe3.zip : one zip for each of the 50084 processed video, containing the Mediapipe Holistic keypoints for each framei3d.zip: 50084 .mat files containing the I3D descriptors for all videosvideoswin.zip: 50084 .npy files containing the Video-Swin features vectors for all processed videossign_segmentation.zip: 50084 files (one per video) containing the frontiers between signs as well as their probability. Code subtitles.csv: 50084 subtitles aligned with the video segmentsvocab.txt: a list of 3894 lemmatised nouns (common and proper) appearing at least 5 times in the subtitles of video segmentsinformation folderinfo_mediapirgb.csv : contains information about the videos : resolution, frame rate, duration, number of frames in each video segment and their localization in train/dev/test sets
提供机构:
ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
创建时间:
2026-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作