V1rtucious/Ecom-Chatbot-Finetuning-Dataset
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/V1rtucious/Ecom-Chatbot-Finetuning-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- e-commerce
- chatbot
- customer-support
- conversational
- fine-tuning
configs:
- config_name: default
data_files:
- split: amazon_reviews
path: data/amazon_reviews-*
- split: amazon_meta
path: data/amazon_meta-*
- split: asos
path: data/asos-*
- split: bitext_retail
path: data/bitext_retail-*
- split: bitext_customer
path: data/bitext_customer-*
- split: synthetic_train
path: data/synthetic_train-*
- split: synthetic_test
path: data/synthetic_test-*
dataset_info:
features:
- name: id
dtype: string
- name: source
dtype: string
- name: group
dtype: string
- name: system
dtype: string
- name: prompt
dtype: string
- name: response_type
dtype: string
- name: response
dtype: string
- name: language
dtype: string
- name: locale
dtype: string
- name: annotator
dtype: string
- name: domain
dtype: string
- name: intent_category
dtype: string
- name: intent
dtype: string
- name: sub_intent
dtype: string
- name: capability
dtype: string
- name: test_tier
dtype: string
- name: history
dtype: string
- name: context
dtype: string
- name: tools
dtype: string
- name: difficulty
dtype: int32
- name: quality_score
dtype: float32
splits:
- name: amazon_reviews
num_bytes: 30820225
num_examples: 23100
- name: amazon_meta
num_bytes: 12521068
num_examples: 5000
- name: asos
num_bytes: 4710702
num_examples: 2000
- name: bitext_retail
num_bytes: 5297710
num_examples: 4998
- name: bitext_customer
num_bytes: 4843246
num_examples: 5000
- name: synthetic_train
num_bytes: 9149948
num_examples: 9000
- name: synthetic_test
num_bytes: 1209007
num_examples: 1000
download_size: 22413441
dataset_size: 68551906
---
# Ecom Chatbot Fine-Tuning Dataset
A unified e-commerce chatbot fine-tuning dataset combining 5 source datasets (40,098 examples total), covering product discovery, order management, customer support, returns, and more.
## Splits
| Split | Source | Examples |
|---|---|---|
| `amazon_meta` | Amazon product metadata | 5,000 |
| `amazon_reviews` | Amazon product reviews | 23,100 |
| `asos_ecom_dataset` | ASOS fashion e-commerce | 2,000 |
| `bitext_customer_support` | Bitext customer support (placeholder-free) | 5,000 |
| `bitext_retail_ecom` | Bitext retail e-commerce (placeholder-free) | 4,998 |
## Schema
Each entry contains:
- `id` — unique identifier
- `source` — originating dataset
- `group` — train/test group (A/B)
- `difficulty` — task difficulty (1–3)
- `system` — system prompt for the assistant
- `history` — prior conversation turns (JSON string)
- `prompt` — user message
- `context` — retrieved docs, cart state, order details (JSON string)
- `tools` — available function tools (JSON string)
- `response_type` — `text` or `tool_call`
- `response` — expected assistant response
- `language` / `locale` — language metadata
- `annotator` — annotation source
- `quality_score` — annotation quality (0–1)
- `domain` — e-commerce domain
- `intent_category` / `intent` / `sub_intent` — intent labels
许可证:Apache-2.0
语言:
- 英语
标签:
- 电子商务
- 聊天机器人(chatbot)
- 客户支持
- 会话式
- 微调(fine-tuning)
配置项:
- 配置名称:default
数据文件:
- 拆分集:amazon_reviews,路径:data/amazon_reviews-*
- 拆分集:amazon_meta,路径:data/amazon_meta-*
- 拆分集:asos,路径:data/asos-*
- 拆分集:bitext_retail,路径:data/bitext_retail-*
- 拆分集:bitext_customer,路径:data/bitext_customer-*
- 拆分集:synthetic_train,路径:data/synthetic_train-*
- 拆分集:synthetic_test,路径:data/synthetic_test-*
数据集信息:
特征字段:
- 名称:id,数据类型:字符串
- 名称:source,数据类型:字符串
- 名称:group,数据类型:字符串
- 名称:system,数据类型:字符串
- 名称:prompt,数据类型:字符串
- 名称:response_type,数据类型:字符串
- 名称:response,数据类型:字符串
- 名称:language,数据类型:字符串
- 名称:locale,数据类型:字符串
- 名称:annotator,数据类型:字符串
- 名称:domain,数据类型:字符串
- 名称:intent_category,数据类型:字符串
- 名称:intent,数据类型:字符串
- 名称:sub_intent,数据类型:字符串
- 名称:capability,数据类型:字符串
- 名称:test_tier,数据类型:字符串
- 名称:history,数据类型:字符串
- 名称:context,数据类型:字符串
- 名称:tools,数据类型:字符串
- 名称:difficulty,数据类型:32位整型
- 名称:quality_score,数据类型:32位浮点型
拆分集详情:
- 拆分集名称:amazon_reviews,字节数:30820225,样本数:23100
- 拆分集名称:amazon_meta,字节数:12521068,样本数:5000
- 拆分集名称:asos,字节数:4710702,样本数:2000
- 拆分集名称:bitext_retail,字节数:5297710,样本数:4998
- 拆分集名称:bitext_customer,字节数:4843246,样本数:5000
- 拆分集名称:synthetic_train,字节数:9149948,样本数:9000
- 拆分集名称:synthetic_test,字节数:1209007,样本数:1000
下载大小:22413441,数据集总大小:68551906
# 电子商务聊天机器人(chatbot)微调(fine-tuning)数据集
本数据集为统一的电商聊天机器人微调数据集,整合了5个源数据集,总计40098条样本,覆盖商品发现、订单管理、客户支持、退换货等多个业务场景。
## 拆分集详情
| 拆分集名称 | 数据源 | 样本数量 |
|---|---|---|
| `amazon_meta` | 亚马逊商品元数据 | 5,000 |
| `amazon_reviews` | 亚马逊商品评论 | 23,100 |
| `asos_ecom_dataset` | ASOS时尚电商 | 2,000 |
| `bitext_customer_support` | Bitext客户支持(无占位符) | 5,000 |
| `bitext_retail_ecom` | Bitext零售电商(无占位符) | 4,998 |
## 数据结构规范
每条样本包含以下字段:
- `id`:唯一标识符
- `source`:所属源数据集
- `group`:训练/测试分组(A/B)
- `difficulty`:任务难度等级(1~3级)
- `system`:助手系统提示词
- `history`:历史对话轮次(JSON字符串格式)
- `prompt`:用户输入消息
- `context`:检索文档、购物车状态、订单详情(JSON字符串格式)
- `tools`:可用函数工具(JSON字符串格式)
- `response_type`:响应类型,可选`text`或`tool_call`
- `response`:预期助手响应内容
- `language` / `locale`:语言与地区元数据
- `annotator`:标注来源
- `quality_score`:标注质量得分(0~1区间)
- `domain`:电商业务领域
- `intent_category` / `intent` / `sub_intent`:层级化意图标签
提供机构:
V1rtucious



