wisenut-nlp-team/llama_jp

Name: wisenut-nlp-team/llama_jp
Creator: wisenut-nlp-team
Published: 2024-05-07 04:16:54
License: 暂无描述

Hugging Face2024-05-07 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/wisenut-nlp-team/llama_jp

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: chat features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 39405392 num_examples: 17120 dataset_size: 39405392 - config_name: multiple features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 2336693449 num_examples: 3285890 dataset_size: 2336693449 - config_name: qa features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 1145861285 num_examples: 142021 dataset_size: 1145861285 - config_name: smr features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 69198207 num_examples: 29168 dataset_size: 69198207 configs: - config_name: chat data_files: - split: train path: data/chat/* - config_name: multiple data_files: - split: train path: data/multiple/* - config_name: qa data_files: - split: train path: data/qa/* - config_name: smr data_files: - split: train path: data/smr/* --- ## chat - jmultiwoz (chat-pred) - length: 3.54k - real-persona-chat (chat-pred) - length: 13.58k ## multiple - Bactrian-X - length: 67k - databricks-dolly-15k-ja - length: 15k - guanaco_ja - length: 100.63k - llm-japanese-dataset-vanilla - length: 2.52M - OpenOrcaJapanese - length: 573.62k ## qa - AutoGeneratedJapaneseQA (open-qa) - length: 93k - JAQKET (closed-qa) - length: 13.33k - JaQuAD (closed-qa) - length: 35.69k ## smr - dialogsum-ja (chat-smr) - length: 20.28k - xlsum (doc-smr) - length: 8.89k

提供机构：

wisenut-nlp-team

原始信息汇总

数据集概述

配置名称：chat

特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
分割:
- train:
  - 字节数: 39405392
  - 示例数: 17120
数据集大小: 39405392字节

配置名称：multiple

特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
分割:
- train:
  - 字节数: 2336693449
  - 示例数: 3285890
数据集大小: 2336693449字节

配置名称：qa

特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
分割:
- train:
  - 字节数: 1145861285
  - 示例数: 142021
数据集大小: 1145861285字节

配置名称：smr

特征:
- instruction: 字符串
- input: 字符串
- output: 字符串
分割:
- train:
  - 字节数: 69198207
  - 示例数: 29168
数据集大小: 69198207字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集