Enkiboy/customer-support-synth-quickdemo
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Enkiboy/customer-support-synth-quickdemo
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
pretty_name: expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)
task_categories:
- text-generation
- question-answering
task_ids:
- conversational
tags:
- synthetic
- customer-support
- llm
- instruction-tuning
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
size_categories:
- n<1K
---
# expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)
Synthetic customer-support dataset generated with synth-dataset-kit.
## What Is Included
- `train.jsonl`: training split in JSONL chat format
- `eval_summary.json`: minimal evaluation and generation summary
- `expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)_quality_report.html`: visual quality report
- `expanded_E-commerce_and_SaaS_customer_support_(orders,_shipping,_billing,_subscriptions,_app_troubleshooting)_quality_report.json`: machine-readable quality report
## Summary
- examples: 4
- generator: synth-dataset-kit
- avg quality: 7.88
- passed: 4
- contamination hits: 0
- distribution divergence: 0.5000
- rebalancing rounds: 0
## Data Generation Process
This dataset was created from a small customer-support seed set.
Generation used staged candidate creation, quality filtering, decontamination, and distribution-aware rebalancing.
## Quality Signals
- lexical diversity: 0.4176
- self-BLEU proxy: 0.0159
- embedding diversity: None
## Distribution Alignment
- divergence: 0.5000
- distribution match score: 0.00/100
- semantic coverage score: 0.00%
- graph coverage score: 0.00%
- underrepresented clusters: none
- semantic coverage gaps: none
- graph frontier clusters: none
## Contamination Audit
- contamination hits: 0
- verdicts: clean=4
## Recommended Use
Use this dataset for instruction tuning or support-assistant fine-tuning experiments.
Validate on your own holdout set before production use.
## Reproducibility
This bundle was generated by synth-dataset-kit from a small seed set.
Use `eval_summary.json` plus the full HTML/JSON report to reproduce the run context.
## Baseline Comparison
- baseline dataset: customer_support_seeds
- example delta: -2
- avg quality delta: +0.8750
- pass rate delta: +33.33%
- lexical diversity delta: -0.1072
- distribution divergence delta: +0.0000
- contamination hit delta: +0
## Reference Dataset Comparison
- reference dataset: customer_support_seeds
- avg user length delta: +53.0833
- avg assistant length delta: +398.7500
- diversity delta: -0.0138
- lexical diversity delta: -0.1072
- style distribution distance: 0.5000
- cluster coverage distance: 0.0000
- difficulty profile distance: 1.0000
- topic overlap ratio: 100.00%
- topic novelty ratio: 0.00%
- exact overlap ratio: 0.00%
- near overlap ratio: 0.00%
- semantic overlap ratio: n/a
- reference alignment score: 70.50/100
提供机构:
Enkiboy



