Felladrin/ChatML-hercules-v2.0
收藏Hugging Face2024-02-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Felladrin/ChatML-hercules-v2.0
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
- text-generation
language:
- en
size_categories:
- 1M<n<10M
---
[Locutusque/hercules-v2.0](https://huggingface.co/datasets/Locutusque/hercules-v2.0) in ChatML format.
Python code used for conversion:
```python
from datasets import load_dataset
import pandas
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path="Felladrin/Llama-160M-Chat-v1"
)
dataset = load_dataset("Locutusque/hercules-v2.0", split="train")
def format(columns):
messages = []
conversation = columns["conversations"]
for i in range(len(conversation)):
message = conversation[i]
content = message["value"]
role = message["from"]
if role == "human":
role = "user"
elif role == "gpt":
role = "assistant"
if role and content:
messages.append(
{
"role": role.strip(),
"content": content.strip(),
}
)
return tokenizer.apply_chat_template(messages, tokenize=False)
pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_parquet("train.parquet", index=False)
```
提供机构:
Felladrin
原始信息汇总
数据集概述
许可证
- Apache 2.0
任务类别
- 问答
- 文本生成
语言
- 英语
数据集大小
- 1M < n < 10M



