goendalf666/sales-conversations-instruction-base
收藏Hugging Face2023-10-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/goendalf666/sales-conversations-instruction-base
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: '0'
dtype: string
splits:
- name: train
num_bytes: 28036745
num_examples: 20940
download_size: 4782593
dataset_size: 28036745
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "sales-conversations-instruction"
Modification of https://huggingface.co/datasets/goendalf666/sales-conversations-2
The following script was used to transform the sales-conversations-2 dataset to this instruction based dataset:
See the main model or github for more information
salesGPT_v2: https://huggingface.co/goendalf666/salesGPT_v2
github: https://github.com/tom813/salesGPT_foundation
This dataset was created for the purpose of training a sales agent chatbot that can convince people.
The initial idea came from: textbooks is all you need https://arxiv.org/abs/2306.11644
gpt-3.5-turbo was used for the generation
# Structure
The conversations have a customer and a salesman which appear always in changing order. customer, salesman, customer, salesman, etc.
The customer always starts the conversation
Who ends the conversation is not defined.
# Generation
Note that a textbook dataset is mandatory for this conversation generation. This examples rely on the following textbook dataset:
https://huggingface.co/datasets/goendalf666/sales-textbook_for_convincing_and_selling
The data generation code can be found here: https://github.com/tom813/salesGPT_foundation/blob/main/data_generation/conversation2conversation_instruction.py
```
import pandas as pd
from datasets import load_dataset, Dataset
data = load_dataset("goendalf666/sales-conversations-2", split="train")
df = data.to_pandas()
df_dict = df.to_dict(orient='list')
df = df.fillna('')
conversations = []
for i in df.iterrows():
current_conversation = ""
try:
for j in i[1]:
if "Customer:" in j:
current_conversation += j + " "
elif "Salesman:" in j:
prompt = f"""You are a in the role of a Salesman. Here is a conversation:
{current_conversation}
Answer as a Salesman to the previous Statement to convince the person to buy the product or service.
{j}"""
conversations.append(prompt)
current_conversation += j + " "
else:
break
except Exception as e:
print(e)
print(len(conversations))
df = pd.DataFrame(conversations)
ds = Dataset.from_pandas(df)
ds.push_to_hub("goendalf666/sales-conversations-instruction")
```
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
goendalf666
原始信息汇总
数据集概述
数据集信息
- 特征名称: 0
- 特征类型: 字符串
- 数据分割:
- 名称: train
- 字节数: 28036745
- 样本数: 20940
- 下载大小: 4782593
- 数据集大小: 28036745
配置信息
- 配置名称: default
- 数据文件:
- 分割: train
- 路径: data/train-*
数据集结构
- 对话角色: 客户和销售员,顺序随机交替出现。
- 对话开始: 客户总是开始对话。
- 对话结束: 未定义谁结束对话。
数据生成
- 依赖数据集: 需要一个教科书数据集,示例依赖于以下数据集:
- 数据生成代码: 可在以下链接找到:



