Sermon_audio_and_text_dataset (SAT)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/fnz5bt24st
下载链接
链接失效反馈官方服务:
资源简介:
Sermon_audio_and_text_dataset (SAT)
The Sermon Audio and Text (SAT) dataset addresses a significant gap in the realm of Islamic Friday Sermons (IFS) research. It offers a rich collection tailored for studies in theology, culture, and linguistics, especially pertinent to Arab and Muslim communities.
Contents:
Volume: Comprises 21,253 synchronized entries of audio and transcription.
Coverage: The dataset captures the nuances of Friday sermons, pivotal in understanding the religious, cultural, and linguistic fabric of the Muslim world.
Description of the data and file structure
The SAT dataset is divided into folders (audio and text).
Inside the Audio folder, separate subfolders for each sermon appeared, such as (sermon_1,sermon_2.. and so on). Also, inside each sermon folder, the wav files appeared named (0_sermon_1, 1_sermon_1, 2_sermon_1, and so on).
Inside the text folder, for each corresponding audio sermon, the text transcript appeared which was divided into separate subfolders for each sermon, such as (sermon_1,sermon_2.. and so on). Inside each sermon folder, the txt files appeared named (0_sermon_1, 1_sermon_1, 2_sermon_1, and so on).
The high-level metadata in the Excel file was provided, to help users read general information about each sermon such as (location, time, date, preacher name, ... so on).
The low-level metadata (more details information for each sermon including the length for each chunk audio "sermon wav files", total number of chunks, silence threshold that is used for segmentation... and so on.
创建时间:
2023-10-03



