Jianshu001/arabic-conversation-batch-1-2-3
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Jianshu001/arabic-conversation-batch-1-2-3
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ar
task_categories:
- text-generation
tags:
- arabic
- synthetic
- multi-turn
- islamic-finance
- healthcare
- education
- energy
- real-estate
- government-services
size_categories:
- 1K<n<10K
---
# Arabic Multi-Domain Conversations (Batch 1+2+3 Merged & Shuffled)
6,406 synthetic multi-turn Arabic conversations across 6 UAE/Middle East domains.
Data from batches 1, 2, and 3 merged, deduplicated, and randomly shuffled.
Generated with **gpt-5.4-mini** (user + assistant), factuality checked with **o3**.
## Domains
| Domain | Arabic | Count |
|--------|--------|-------|
| Education | التعليم | 1,212 |
| Healthcare | الرعاية الصحية | 1,143 |
| Energy | الطاقة | 1,022 |
| Islamic Finance | التمويل الإسلامي | 1,019 |
| Real Estate | العقارات | 1,014 |
| Government Services | الخدمات الحكومية | 996 |
## Stats
- 6,406 conversations (shuffled)
- Average user message: ~150 chars
- Markdown headings in assistant: 0%
- Factuality: ~93% pass
- 3-5 turns per conversation
- No duplicate persona x subtopic combinations
## Generation Config
- User/Assistant model: gpt-5.4-mini
- Factuality check: o3
- Quality check: truncation detection
## Format
JSONL: id, domain, domain_ar, topic, topic_ar, subtopic_ar, persona, conversation, metadata, factuality
提供机构:
Jianshu001



