five

Shaer-AI/ashaar-with-enhanced-descriptions-baseform-final-sft-lte20-min500-splits

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Shaer-AI/ashaar-with-enhanced-descriptions-baseform-final-sft-lte20-min500-splits
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar license: apache-2.0 pretty_name: Ashaar Enhanced Description SFT Stratified Splits task_categories: - text-generation size_categories: - 100K<n<1M --- # Ashaar Enhanced Description SFT Stratified Splits Source dataset: - `Shaer-AI/ashaar-with-enhanced-descriptions-baseform-final-sft-lte20-min500` Target dataset: - `Shaer-AI/ashaar-with-enhanced-descriptions-baseform-final-sft-lte20-min500-splits` This dataset publishes deterministic `train / eval / test` splits with a `94 / 3 / 3` policy. ## Split policy Primary stratification key: - `base_meter` - `form` - `length_bucket` Length buckets: - `1-3` - `4-6` - `7-10` - `11-20` Small groups fall back gracefully to coarser stratification levels when needed. ## Counts - train: **109070** - eval: **3481** - test: **3481** ## Stratification fallback levels used - `base_meter_form_length_bucket`: **116032** - `base_meter_form`: **0** - `base_meter`: **0** - `global`: **0** ## Notes - `sampler_group` keeps the fine-grained joint group `base_meter||form||length_bucket` - `split_group` is the actual group used to allocate split quotas after fallback - weighted sampling should be applied only on the `train` split
提供机构:
Shaer-AI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作