dariolopez/Llama-2-oasst1-es
收藏Hugging Face2023-08-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dariolopez/Llama-2-oasst1-es
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 4524060
num_examples: 3909
download_size: 2528456
dataset_size: 4524060
license: apache-2.0
language:
- es
size_categories:
- 1K<n<10K
---
# OpenAssistant Conversations Spanish Dataset (OASST1-es) for Llama-2
## Dataset Summary
Subset of the original [OpenAssistant Conversations Dataset (OASST)](https://huggingface.co/datasets/OpenAssistant/oasst1).
* Filtered by `lang=es`.
* Formatted according to the Llama-2 pattern: "\<s> [INST] user prompt [/INST] output model \</s>"
* Select the best ranked output (Some instructions had multiple outputs ranked by humans).
* Select only the first level of the tree conversation.
## Dataset Structure
The dataset has 3909 rows of tuples (instructions and outputs).
提供机构:
dariolopez
原始信息汇总
OpenAssistant Conversations Spanish Dataset (OASST1-es) for Llama-2
数据集概述
- 该数据集是原始OpenAssistant Conversations Dataset (OASST)的子集。
- 过滤条件为
lang=es,即西班牙语。 - 格式遵循Llama-2模式:"<s> [INST] user prompt [/INST] output model </s>"。
- 选择最佳排名的输出(某些指令有多个输出,由人工排名)。
- 仅选择树状对话的第一层级。
数据集结构
- 数据集包含3909行,每行是一个元组(指令和输出)。



