umarigan/openhermes_tr
收藏Hugging Face2024-03-18 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/umarigan/openhermes_tr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 330804375
num_examples: 241853
download_size: 157831782
dataset_size: 330804375
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
task_categories:
- summarization
- text-generation
- text2text-generation
language:
- tr
size_categories:
- 100K<n<1M
---
The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data generated by Teknium, from open datasets across the AI landscape, including:
I translated all of the this dataset using Google Translate to for Turkish LLM Communities.
OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
Airoboros GPT-4 (v1.0), by JonDurbin
Camel-AI's domain expert datasets, by the Camel-AI Team
CodeAlpaca, by Sahil2801
GPT4-LLM and Unnatural Instructions, by Microsoft
Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more
提供机构:
umarigan
原始信息汇总
数据集概述
数据集信息
- 名称: OpenHermes
- 特征:
- instruction: 数据类型为字符串
- input: 数据类型为字符串
- output: 数据类型为字符串
数据集划分
- 训练集:
- 大小: 330,804,375 字节
- 样本数: 241,853
数据集大小
- 下载大小: 157,831,782 字节
- 总大小: 330,804,375 字节
配置
- 默认配置:
- 数据文件路径:
data/train-*
- 数据文件路径:
任务类别
- 总结
- 文本生成
- 文本到文本生成
语言
- 土耳其语
大小类别
- 100K<n<1M



