five

Customer Service QA Pairs for LLM Conversational Training

收藏
Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/2ef1f9e3-d95d-491a-9251-dea46285b410/Bitext-Innovation-International_Customer-Service-QA-Pairs-for-LLM-Conversational-Training
下载链接
链接失效反馈
官方服务:
资源简介:
**Overview** This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM Fine-Tuning. For example, if you are [ACME Company], you can create your own customized LLM by first training a fine-tuned model using this dataset, and then further fine-tuning it with a small amount of your own data. An overview of this approach can be found at: [From General-Purpose LLMs to Verticalized Enterprise Models](https://www.bitext.com/blog/general-purpose-models-verticalized-enterprise-genai/) **Use cases** - Training sophisticated AI models for intent detection and response generation in customer service applications. - Enhancing domain adaptation and fine-tuning of Large Language Models such as DBRX, GPT, Mistral, Llama3, Falcon, etc. with a diverse range of intents and contextual customer queries. **Product details** This rich dataset encompasses: - **Categories:** Spanning across ACCOUNT, CANCELLATION_FEE, CONTACT, DELIVERY, and more, facilitating nuanced model training. - **Intents:** Features 27 distinct intents like create_account, check_cancellation_fee, and track_order, among others. - **Questions/Answers:** A collection of 26,872 pairs with an average of 1,000 per intent. - **Entities/Slots:** Includes 30 types such as {{Order Number}}, {{Invoice Number}}, and {{Customer Support Email}}. - **Language Generation Tags:** Contains 12 types of tags for morphological, semantic, syntactic, and register variations, aiding in the creation of dialogues that mimic real-life conversational patterns. For an immersive look into the dataset structure and content, refer to the embedded notebook which showcases sample queries and responses along with detailed instructions for use. **Additional Insights** - The dataset has been meticulously curated by computational linguists, ensuring quality and relevance. - Comprehensive tagging allows for dataset customization based on linguistic phenomena, catering to various user profiles and conversational styles. - Visit [Bitext's Vertical-Specific Datasets](https://www.bitext.com/chatbot-verticals/) for an in-depth understanding of our vertical coverage and intents.

## 数据集概览 本混合合成数据集专为GPT、Mistral及OpenELM等大语言模型(Large Language Model,LLM)的微调任务设计,通过我们的自然语言处理(Natural Language Processing,NLP)/自然语言生成(Natural Language Generation,NLG)技术与自动化数据标注(Data Labeling,DAL)工具生成。其核心目标是展示如何借助我们提出的大语言模型微调两步法,轻松实现客户支持领域的大语言模型垂直化/领域适配。例如,若您隶属于[ACME公司],可先通过本数据集训练微调模型,再使用少量自有数据对其进行进一步微调,从而打造专属定制化大语言模型。该方法的详细概述可参阅:[From General-Purpose LLMs to Verticalized Enterprise Models](https://www.bitext.com/blog/general-purpose-models-verticalized-enterprise-genai/) ## 应用场景 - 训练适用于客服场景的意图识别与回复生成的高精度AI模型 - 针对DBRX、GPT、Mistral、Llama3、Falcon等大语言模型,通过多样化意图与上下文客户查询实现领域适配与微调优化 ## 产品详情 本丰富数据集包含以下内容: - **类别**:覆盖账户(ACCOUNT)、取消费用(CANCELLATION_FEE)、联系方式(CONTACT)、配送(DELIVERY)等多个类别,便于开展精细化模型训练 - **意图**:包含27种不同意图,例如创建账户(create_account)、查询取消费用(check_cancellation_fee)、追踪订单(track_order)等 - **问答对**:共计26872组,平均每种意图对应约1000组问答 - **实体/槽位**:涵盖30类实体,例如{{订单编号}}、{{发票编号}}、{{客户支持邮箱}} - **语言生成标签**:包含12类标签,用于覆盖形态、语义、句法及语域变体,助力生成贴合真实对话模式的交互文本 如需深入了解数据集结构与内容,请参阅内嵌的演示笔记,其中包含示例查询与回复及详细使用说明。 ## 额外洞察 - 本数据集由计算语言学家精心编撰,确保内容质量与领域相关性 - 全面的标签体系支持基于语言现象的数据集定制,可适配不同用户画像与对话风格 - 访问[Bitext垂直领域专属数据集](https://www.bitext.com/chatbot-verticals/)可深入了解我们的垂直领域覆盖范围与意图类型
提供机构:
Bitext Innovation International
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个混合合成的客户服务问答对集合,专用于微调GPT、Mistral等大语言模型,以提升其在客户支持领域的垂直化适应能力。它包含27种意图、近2.7万个问答对、30种实体类型和12种语言生成标签,支持模型在意图检测和响应生成方面的精细化训练。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作