HossamEL-Dein/arabic-eou-dataset
收藏Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/HossamEL-Dein/arabic-eou-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含5,000个阿拉伯语样本,用于话语结束(EOU)检测,专门为沙特方言的对话AI应用设计。
**目的**:训练模型以检测阿拉伯语对话中说话者何时结束其对话轮次。
数据集包含真实和合成数据,其中真实数据来自SADA22(沙特广播管理局电视节目),合成数据通过沙特阿拉伯短语组合和真实停顿建模生成。每个样本包含文本、eou_label(二元标签)、pause_duration(停顿持续时间)、confidence(标签置信度)、source_file(数据来源)和dialect(方言)字段。数据集分为训练集(80%)、验证集(10%)和测试集(10%)。
数据集适用于阿拉伯语EOU检测模型的训练、沙特方言对话AI开发、阿拉伯语对话中的轮次转换研究以及实时语音助手开发。
This dataset contains 5,000 Arabic samples for End-of-Utterance (EOU) detection, specifically designed for Saudi dialect conversational AI applications.
**Purpose**: Train models to detect when a speaker has finished their conversational turn in Arabic dialogue.
The dataset includes both real and synthetic data, with real data sourced from SADA22 (Saudi Broadcasting Authority TV shows) and synthetic data generated through Saudi Arabic phrase combinations and realistic pause modeling. Each sample contains text, eou_label (binary label), pause_duration (pause duration in seconds), confidence (label confidence level), source_file (data source), and dialect (dialect) fields. The dataset is split into training (80%), validation (10%), and test (10%) sets.
The dataset is suitable for training EOU detection models for Arabic, Saudi dialect conversational AI development, turn-taking research in Arabic dialogues, and real-time voice assistant development.
提供机构:
HossamEL-Dein



