facebook/2M-Belebele
收藏Hugging Face2024-12-17 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/facebook/2M-Belebele
下载链接
链接失效反馈官方服务:
资源简介:
2M-Belebele数据集是一个高度多语言的语音和美国手语(ASL)理解数据集,扩展了现有的Belebele纯文本数据集,涵盖了74种口语语言和一种手语(ASL)。语音数据集通过对齐Belebele、Flores200和Fleurs数据集构建,并为Fleurs中缺失的句子录制了新的音频。此外,还提供了Belebele问题和答案的新录音,这些内容在原始的Flores200数据集中不存在。ASL数据集则通过完全新的受控录制构建,每个Flores句子以及问题和答案都以视频格式提供。
The 2M-Belebele dataset is a highly multilingual speech and American Sign Language (ASL) comprehension dataset, extending the existing Belebele text-only dataset to cover 74 spoken languages and one sign language (ASL). The speech dataset is built by aligning the Belebele, Flores200, and Fleurs datasets, and includes new audio recordings for sentences missing in Fleurs. Additionally, new recordings for Belebele questions and answers are provided, which were not present in the original Flores200 dataset. The ASL dataset is constructed with completely new controlled recordings, with each Flores sentence and its questions and answers available in video format.
提供机构:
facebook



