mahmoudsaalama/arabic-eou-saudi-dialect
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mahmoudsaalama/arabic-eou-saudi-dialect
下载链接
链接失效反馈官方服务:
资源简介:
该数据集旨在训练和评估阿拉伯语对话的句子结束检测(EOU)模型,特别关注沙特方言模式。数据集包含阿拉伯语对话样本,用于二元分类:1表示句子结束(说话者已完成其轮次),0表示非句子结束(说话者将继续)。数据主要包含真实的沙特阿拉伯对话模式,包括自然问候和回应、问题和答案、常见的沙特方言表达以及轮换模式。数据集结构包括训练集(约1600个样本)、验证集(约200个样本)和测试集(约200个样本),总共约2000个样本。数据来源主要是合成的沙特阿拉伯对话(90%)和公开的阿拉伯语数据集(10%)。
This dataset is designed for training and evaluating End-of-Utterance (EOU) detection models for Arabic conversations, with emphasis on Saudi dialect patterns. The dataset contains Arabic conversational samples labeled for binary classification: 1 for end of utterance (speaker has finished their turn) and 0 for not end of utterance (speaker will continue). The data focuses on realistic Saudi Arabic conversational patterns including natural greetings and responses, questions and answers, common Saudi dialect expressions, and turn-taking patterns. The dataset structure includes a training set (~1,600 samples), a validation set (~200 samples), and a test set (~200 samples), totaling ~2,000 samples. The data primarily comes from synthetic Saudi Arabic conversations (90%) and public Arabic datasets (10%).
提供机构:
mahmoudsaalama



