Felladrin/ChatML-HelpSteer

Name: Felladrin/ChatML-HelpSteer
Creator: Felladrin
Published: 2024-02-17 22:49:36
License: 暂无描述

Hugging Face2024-02-17 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Felladrin/ChatML-HelpSteer

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en size_categories: - 10K<n<100K task_categories: - question-answering - text-generation --- [nvidia/HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) in ChatML format, ready to use in [HuggingFace TRL's SFT Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer). Python code used for conversion: ```python from datasets import load_dataset from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Felladrin/Llama-160M-Chat-v1") dataset = load_dataset("nvidia/HelpSteer", split="train") def format(columns): prompt = columns["prompt"].strip() response = columns["response"].strip() messages = [ { "role": "user", "content": prompt, }, { "role": "assistant", "content": response, }, ] return { "text": tokenizer.apply_chat_template(messages, tokenize=False) } dataset.map(format).select_columns(['text', 'helpfulness', 'correctness', 'coherence', 'complexity', 'verbosity']).to_parquet("train.parquet") ```

提供机构：

Felladrin

原始信息汇总

数据集概述

许可证

CC BY 4.0

语言

英语

数据规模

10K < n < 100K

任务类别

问答
文本生成

数据格式

数据集以ChatML格式提供，适用于HuggingFace TRL的SFT Trainer。

数据处理

使用Python代码将数据集转换为特定格式，包括将提示和响应转换为消息格式，并应用tokenizer处理。
最终输出为Parquet格式的文件。

5,000+

优质数据集

54 个

任务类型

进入经典数据集