alvarobartt/airoboros2.2-pref-10k

Name: alvarobartt/airoboros2.2-pref-10k
Creator: alvarobartt
Published: 2024-03-28 08:43:01
License: 暂无描述

Hugging Face2024-03-28 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/alvarobartt/airoboros2.2-pref-10k

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: source dtype: string - name: category dtype: class_label: names: '0': cot '1': counterfactual_contextual '2': greeting '3': multiple_choice '4': orca '5': general '6': theory_of_mind '7': detailed_writing '8': gtkm '9': writing '10': awareness '11': song '12': summarization '13': experience '14': agent '15': plan '16': editor '17': wordgame '18': misconception '19': coding '20': joke '21': card '22': roleplay '23': quiz '24': trivia '25': rp '26': riddle '27': stylized_response - name: chosen list: - name: content dtype: string - name: role dtype: string - name: chosen_model dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string - name: rejected_model dtype: string splits: - name: train num_bytes: 30956876.72890733 num_examples: 9500 - name: test num_bytes: 1629309.3015214384 num_examples: 500 download_size: 17953662 dataset_size: 32586186.03042877 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* ---

提供机构：

alvarobartt

原始信息汇总

数据集概述

数据集特征

source：数据类型为字符串。
category：数据类型为分类标签，包含以下类别：
- 0: cot
- 1: counterfactual_contextual
- 2: greeting
- 3: multiple_choice
- 4: orca
- 5: general
- 6: theory_of_mind
- 7: detailed_writing
- 8: gtkm
- 9: writing
- 10: awareness
- 11: song
- 12: summarization
- 13: experience
- 14: agent
- 15: plan
- 16: editor
- 17: wordgame
- 18: misconception
- 19: coding
- 20: joke
- 21: card
- 22: roleplay
- 23: quiz
- 24: trivia
- 25: rp
- 26: riddle
- 27: stylized_response
chosen：包含以下子特征：
- content：数据类型为字符串。
- role：数据类型为字符串。
chosen_model：数据类型为字符串。
rejected：包含以下子特征：
- content：数据类型为字符串。
- role：数据类型为字符串。
rejected_model：数据类型为字符串。

数据集分割

train：包含9500个示例，占用30956876.72890733字节。
test：包含500个示例，占用1629309.3015214384字节。

数据集大小

下载大小：17953662字节。
数据集总大小：32586186.03042877字节。

配置文件

config_name：default
data_files：
- train：路径为data/train-*。
- test：路径为data/test-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集