qed_amara

Name: qed_amara
Creator: huggingface.co
License: 暂无描述

huggingface.co2025-03-23 收录

下载链接：

https://huggingface.co/datasets/Helsinki-NLP/qed_amara

下载链接

链接失效反馈

官方服务：

资源简介：

The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform. Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group The QED Corpus is made public for RESEARCH purpose only. The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved. 225 languages, 9,291 bitexts total number of files: 271,558 total number of tokens: 371.76M total number of sentence fragments: 30.93M

Qatar Computing Research Institute 教育领域语料库（原名 QCRI AMARA 语料库）系一项开放的多语言字幕集合，包括教育视频与讲座的字幕，这些字幕是在 AMARA 网络平台的基础上，通过协作转录与翻译形成的。该语料库由卡塔尔计算研究学院阿拉伯语言技术小组开发。QED 语料库仅限用于研究目的公开发布。发布该语料库是出于希望其能有所裨益的愿望，但不对任何形式的质量保证负责，包括但不限于商品性或适用于特定目的的隐含保证。版权归属卡塔尔计算研究学院，所有权利保留。语料库包含225种语言，9291个双语文本，文件总数为271,558个，标记总数为371.76百万，句子片段总数为30.93百万。

提供机构：

huggingface.co

5,000+

优质数据集

54 个

任务类型

进入经典数据集