PuristanLabs1/Urdu-Turn-Detection-10k
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/PuristanLabs1/Urdu-Turn-Detection-10k
下载链接
链接失效反馈官方服务:
资源简介:
一个高质量的乌尔都语句子数据集,包含10,000个句子,用于对话转向检测(End-of-Turn)。该数据集旨在帮助对话AI系统判断用户是否已完成发言(Complete)或正在暂停/犹豫(Incomplete)。数据集包含真实对话和合成数据,语言为纯乌尔都语(Nastaliq/Arabic Script),无英语或罗马字符污染。标签为1(Complete)和0(Incomplete),平衡性约为50%。适用于训练低延迟分类模型(如BERT、DistilBERT),用于实时语音机器人和代理。
A high-quality dataset of 10,000 Urdu sentences labeled for Turn Detection (End-of-Turn). This dataset is designed to help conversational AI systems determine if a user has finished speaking (Complete) or is pausing/trailing off (Incomplete). The dataset is a combination of real-world validation and high-quality synthetic generation, with pure Urdu script (Nastaliq/Arabic Script) and no English/Roman script contamination. Labels are 1 (Complete) and 0 (Incomplete), with a balance of ~50%. Optimal for training low-latency Classification models (BERT, DistilBERT) for realtime voice bots & Agents.
提供机构:
PuristanLabs1



