Felladrin/pretrain-databricks-dolly-15k
收藏Hugging Face2024-01-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Felladrin/pretrain-databricks-dolly-15k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-3.0
source_datasets:
- databricks/databricks-dolly-15k
---
Conversion of [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset to be used in pretraining.
Python code used for conversion:
```python
from datasets import load_dataset
import pandas
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
def format(columns):
instruction = columns["instruction"].strip()
answer = columns["response"].strip()
return f"{instruction}\n\n{answer}"
pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)
```
提供机构:
Felladrin
原始信息汇总
数据集概述
数据集来源
- 原始数据集:
databricks/databricks-dolly-15k
数据集用途
- 用于预训练(pretraining)
数据转换代码
- 使用Python代码将原始数据集转换为适用于预训练的格式。
python from datasets import load_dataset import pandas
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
def format(columns): instruction = columns["instruction"].strip() answer = columns["response"].strip() return f"{instruction}
{answer}"
pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)



