infCapital/viet-llama2-ft
收藏Hugging Face2023-09-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/infCapital/viet-llama2-ft
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 2088825146
num_examples: 1932833
download_size: 874832201
dataset_size: 2088825146
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset mix from:
+ databricks/databricks-dolly-15k
+ ewof/alpaca-instruct-unfiltered
+ garage/bAInd_Open-Platypus
+ gbharti/finance-alpaca
+ Honkware/oasst1-alpaca
+ medical/chat
+ pankajmathur/WizardLM_Orca
+ teknium/GPTeacher-General-Instruct
+ LIMA
+ Chain-of-Thought
+ Dynosaur/dynosaur-full
+ nam194_vietnews
+ quora_chat
+ stackoverflow_chat
# Dataset Creation:
+ The source language dataset was translated into Vietnamese using the OpenAI GPT-3.5 API.
+ 2% of the translations got translation errors. These translations were skipped.
+ The remaining translations were merged into 1 main dataset for Fine-Tuning
# Important Notes:
+ This dataset was translated by a machine learning model, and may contain errors or inaccuracies.
+ 2% of the original dataset could not be processed automatically and were skipped.
提供机构:
infCapital
原始信息汇总
数据集信息
特征
- instruction: 类型为字符串
- input: 类型为字符串
- output: 类型为字符串
分割
- train: 包含2088825146字节,1932833个样本
大小
- 下载大小: 874832201字节
- 数据集大小: 2088825146字节
配置
- default: 数据文件路径为
data/train-*
数据集创建
- 原始语言数据集通过OpenAI GPT-3.5 API翻译成越南语。
- 2%的翻译存在错误,被跳过。
- 剩余的翻译合并成一个主要数据集用于微调。
重要注意事项
- 该数据集由机器学习模型翻译,可能包含错误或不准确之处。
- 2%的原始数据集无法自动处理,被跳过。



