five

Enkiboy/customer-support-synth-quickdemo

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Enkiboy/customer-support-synth-quickdemo
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit pretty_name: expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting) task_categories: - text-generation - question-answering task_ids: - conversational tags: - synthetic - customer-support - llm - instruction-tuning configs: - config_name: default data_files: - split: train path: train.jsonl size_categories: - n<1K --- # expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting) Synthetic customer-support dataset generated with synth-dataset-kit. ## What Is Included - `train.jsonl`: training split in JSONL chat format - `eval_summary.json`: minimal evaluation and generation summary - `expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)_quality_report.html`: visual quality report - `expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)_quality_report.json`: machine-readable quality report ## Summary - examples: 4 - generator: synth-dataset-kit - avg quality: 7.88 - passed: 4 - contamination hits: 0 - distribution divergence: 0.5000 - rebalancing rounds: 0 ## Data Generation Process This dataset was created from a small customer-support seed set. Generation used staged candidate creation, quality filtering, decontamination, and distribution-aware rebalancing. ## Quality Signals - lexical diversity: 0.4176 - self-BLEU proxy: 0.0159 - embedding diversity: None ## Distribution Alignment - divergence: 0.5000 - distribution match score: 0.00/100 - semantic coverage score: 0.00% - graph coverage score: 0.00% - underrepresented clusters: none - semantic coverage gaps: none - graph frontier clusters: none ## Contamination Audit - contamination hits: 0 - verdicts: clean=4 ## Recommended Use Use this dataset for instruction tuning or support-assistant fine-tuning experiments. Validate on your own holdout set before production use. ## Reproducibility This bundle was generated by synth-dataset-kit from a small seed set. Use `eval_summary.json` plus the full HTML/JSON report to reproduce the run context. ## Baseline Comparison - baseline dataset: customer_support_seeds - example delta: -2 - avg quality delta: +0.8750 - pass rate delta: +33.33% - lexical diversity delta: -0.1072 - distribution divergence delta: +0.0000 - contamination hit delta: +0 ## Reference Dataset Comparison - reference dataset: customer_support_seeds - avg user length delta: +53.0833 - avg assistant length delta: +398.7500 - diversity delta: -0.0138 - lexical diversity delta: -0.1072 - style distribution distance: 0.5000 - cluster coverage distance: 0.0000 - difficulty profile distance: 1.0000 - topic overlap ratio: 100.00% - topic novelty ratio: 0.00% - exact overlap ratio: 0.00% - near overlap ratio: 0.00% - semantic overlap ratio: n/a - reference alignment score: 70.50/100
提供机构:
Enkiboy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作