Ar-DAD: Arabic Diversified Audio Dataset
收藏Mendeley Data2020-04-10 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/3kndp5vs6b
下载链接
链接失效反馈官方服务:
资源简介:
This is the only available audio library covering this large number of reciters and verses in one harmonized structure that can be used by concerned researchers in different directions. The audio files are organized into two main datasets. Reciters are put into 37 folders created per chapter (78- 114), within each chapter, subfolders are created as per the verse number, within each verse folder, the audio clips are enumerated into 30 different reciters. The second subset includes only one folder of audio clips for imitators categorized by an anonymous ID. The data is shared as WAV format for the audio clips with maximum quality as recorded and disseminated over the internet, no enhancement of any kind is applied after scraping. In this version of the dataset (V3), a third data folder is added for the textual materials of all verses as plain text files; with and without vocalization\vowelization (تشكيل __ Tashkeel).
本音频库为目前唯一可通过统一结构化形式涵盖海量诵经者与经文节段的公开资源,可供多领域相关研究人员使用。音频文件分为两大数据集进行组织:所有诵经者按章节划分为37个文件夹(编号78至114);每个章节下再按经文节号创建子文件夹,每个经文节文件夹内收纳30位不同诵经者的音频片段。第二子集仅包含一个音频片段文件夹,收纳按匿名ID分类的模仿诵经者音频。本次共享的音频片段均采用WAV格式,保留了录制并在互联网上传播时的原始最高音质,抓取后未进行任何形式的音质增强处理。本数据集V3版本新增了第三个数据文件夹,用于存储所有经文节段的纯文本格式文本资料,涵盖带元音标注(Tashkeel)与不带元音标注两种版本。
创建时间:
2020-04-10



