wisenut-nlp-team/llama_jp
收藏Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/wisenut-nlp-team/llama_jp
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: chat
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 39405392
num_examples: 17120
dataset_size: 39405392
- config_name: multiple
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 2336693449
num_examples: 3285890
dataset_size: 2336693449
- config_name: qa
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 1145861285
num_examples: 142021
dataset_size: 1145861285
- config_name: smr
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 69198207
num_examples: 29168
dataset_size: 69198207
configs:
- config_name: chat
data_files:
- split: train
path: data/chat/*
- config_name: multiple
data_files:
- split: train
path: data/multiple/*
- config_name: qa
data_files:
- split: train
path: data/qa/*
- config_name: smr
data_files:
- split: train
path: data/smr/*
---
## chat
- jmultiwoz (chat-pred)
- length: 3.54k
- real-persona-chat (chat-pred)
- length: 13.58k
## multiple
- Bactrian-X
- length: 67k
- databricks-dolly-15k-ja
- length: 15k
- guanaco_ja
- length: 100.63k
- llm-japanese-dataset-vanilla
- length: 2.52M
- OpenOrcaJapanese
- length: 573.62k
## qa
- AutoGeneratedJapaneseQA (open-qa)
- length: 93k
- JAQKET (closed-qa)
- length: 13.33k
- JaQuAD (closed-qa)
- length: 35.69k
## smr
- dialogsum-ja (chat-smr)
- length: 20.28k
- xlsum (doc-smr)
- length: 8.89k
提供机构:
wisenut-nlp-team
原始信息汇总
数据集概述
配置名称:chat
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- 分割:
- train:
- 字节数: 39405392
- 示例数: 17120
- train:
- 数据集大小: 39405392字节
配置名称:multiple
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- 分割:
- train:
- 字节数: 2336693449
- 示例数: 3285890
- train:
- 数据集大小: 2336693449字节
配置名称:qa
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- 分割:
- train:
- 字节数: 1145861285
- 示例数: 142021
- train:
- 数据集大小: 1145861285字节
配置名称:smr
- 特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
- 分割:
- train:
- 字节数: 69198207
- 示例数: 29168
- train:
- 数据集大小: 69198207字节



