Felladrin/pretrain-webGPT_x_dolly
收藏Hugging Face2024-01-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Felladrin/pretrain-webGPT_x_dolly
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-3.0
source_datasets:
- starfishmedical/webGPT_x_dolly
---
Conversion of [starfishmedical/webGPT_x_dolly](https://huggingface.co/datasets/starfishmedical/webGPT_x_dolly) dataset to be used in pretraining.
Python code used for conversion:
```python
from datasets import load_dataset
import pandas
dataset = load_dataset("starfishmedical/webGPT_x_dolly", split="train")
def format(columns):
question = columns["instruction"].strip()
answer = columns["output"].strip()
return f"{question}\n\n{answer}"
pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)
```
提供机构:
Felladrin
原始信息汇总
数据集概述
数据集来源
- 原始数据集:
starfishmedical/webGPT_x_dolly
数据集用途
- 用于预训练的转换数据集
数据转换代码
python from datasets import load_dataset import pandas
dataset = load_dataset("starfishmedical/webGPT_x_dolly", split="train")
def format(columns): question = columns["instruction"].strip() answer = columns["output"].strip() return f"{question}
{answer}"
pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)



