five

PuristanLabs1/Urdu-Turn-Detection-10k

收藏
Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/PuristanLabs1/Urdu-Turn-Detection-10k
下载链接
链接失效反馈
官方服务:
资源简介:
一个高质量的乌尔都语句子数据集,包含10,000个句子,用于对话转向检测(End-of-Turn)。该数据集旨在帮助对话AI系统判断用户是否已完成发言(Complete)或正在暂停/犹豫(Incomplete)。数据集包含真实对话和合成数据,语言为纯乌尔都语(Nastaliq/Arabic Script),无英语或罗马字符污染。标签为1(Complete)和0(Incomplete),平衡性约为50%。适用于训练低延迟分类模型(如BERT、DistilBERT),用于实时语音机器人和代理。

A high-quality dataset of 10,000 Urdu sentences labeled for Turn Detection (End-of-Turn). This dataset is designed to help conversational AI systems determine if a user has finished speaking (Complete) or is pausing/trailing off (Incomplete). The dataset is a combination of real-world validation and high-quality synthetic generation, with pure Urdu script (Nastaliq/Arabic Script) and no English/Roman script contamination. Labels are 1 (Complete) and 0 (Incomplete), with a balance of ~50%. Optimal for training low-latency Classification models (BERT, DistilBERT) for realtime voice bots & Agents.
提供机构:
PuristanLabs1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作