five

Jianshu001/arabic-conversation-batch-1-2-3

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Jianshu001/arabic-conversation-batch-1-2-3
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar task_categories: - text-generation tags: - arabic - synthetic - multi-turn - islamic-finance - healthcare - education - energy - real-estate - government-services size_categories: - 1K<n<10K --- # Arabic Multi-Domain Conversations (Batch 1+2+3 Merged & Shuffled) 6,406 synthetic multi-turn Arabic conversations across 6 UAE/Middle East domains. Data from batches 1, 2, and 3 merged, deduplicated, and randomly shuffled. Generated with **gpt-5.4-mini** (user + assistant), factuality checked with **o3**. ## Domains | Domain | Arabic | Count | |--------|--------|-------| | Education | التعليم | 1,212 | | Healthcare | الرعاية الصحية | 1,143 | | Energy | الطاقة | 1,022 | | Islamic Finance | التمويل الإسلامي | 1,019 | | Real Estate | العقارات | 1,014 | | Government Services | الخدمات الحكومية | 996 | ## Stats - 6,406 conversations (shuffled) - Average user message: ~150 chars - Markdown headings in assistant: 0% - Factuality: ~93% pass - 3-5 turns per conversation - No duplicate persona x subtopic combinations ## Generation Config - User/Assistant model: gpt-5.4-mini - Factuality check: o3 - Quality check: truncation detection ## Format JSONL: id, domain, domain_ar, topic, topic_ar, subtopic_ar, persona, conversation, metadata, factuality
提供机构:
Jianshu001
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作