Mermeid/Bitext-customer-support-llm-chatbot-training-dataset

Name: Mermeid/Bitext-customer-support-llm-chatbot-training-dataset
Creator: Mermeid
Published: 2026-04-29 16:25:40
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/Mermeid/Bitext-customer-support-llm-chatbot-training-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该混合合成数据集专为微调大型语言模型（如GPT、Mistral和OpenELM）而设计，采用NLP/NLG技术和自动化数据标注工具生成。其目标是展示如何通过两步法微调LLM，轻松实现客户支持领域的垂直化/领域适应。例如，[ACME公司]可先使用本数据集训练微调模型，再用少量自有数据进一步微调，从而创建定制化LLM。数据集包含27个意图（归为10类）、26,872个问答对（每个意图约1000个）、30种实体/槽位类型及12类语言生成标签。这些标签涵盖词汇、句法、语言风格等多维度变异，能针对不同用户画像定制训练数据，提升助手准确性与鲁棒性。数据集覆盖20个垂直领域（如汽车、零售银行、医疗等）的通用意图，所有数据均经计算语言学家校验。

This hybrid synthetic dataset is designed to fine-tune Large Language Models (e.g., GPT, Mistral, OpenELM) using NLP/NLG technology and automated Data Labeling tools. It demonstrates verticalization/domain adaptation for customer support via a two-step LLM fine-tuning approach—e.g., [ACME Company] can first train a model with this dataset, then refine it with proprietary data. The dataset contains 27 intents (10 categories), 26,872 Q&A pairs (~1,000 per intent), 30 entity/slot types, and 12 language generation tags. These tags capture lexical, syntactic, and stylistic variations (e.g., colloquialism, politeness) to customize datasets for diverse user profiles, enhancing assistant accuracy. It covers common intents across 20 verticals (e.g., Automotive, Retail Banking, Healthcare), with all data curated by computational linguists.

提供机构：

Mermeid

5,000+

优质数据集

54 个

任务类型

进入经典数据集