five

wisenut-nlp-team/llama_jp

收藏
Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/wisenut-nlp-team/llama_jp
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: chat features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 39405392 num_examples: 17120 dataset_size: 39405392 - config_name: multiple features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 2336693449 num_examples: 3285890 dataset_size: 2336693449 - config_name: qa features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 1145861285 num_examples: 142021 dataset_size: 1145861285 - config_name: smr features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 69198207 num_examples: 29168 dataset_size: 69198207 configs: - config_name: chat data_files: - split: train path: data/chat/* - config_name: multiple data_files: - split: train path: data/multiple/* - config_name: qa data_files: - split: train path: data/qa/* - config_name: smr data_files: - split: train path: data/smr/* --- ## chat - jmultiwoz (chat-pred) - length: 3.54k - real-persona-chat (chat-pred) - length: 13.58k ## multiple - Bactrian-X - length: 67k - databricks-dolly-15k-ja - length: 15k - guanaco_ja - length: 100.63k - llm-japanese-dataset-vanilla - length: 2.52M - OpenOrcaJapanese - length: 573.62k ## qa - AutoGeneratedJapaneseQA (open-qa) - length: 93k - JAQKET (closed-qa) - length: 13.33k - JaQuAD (closed-qa) - length: 35.69k ## smr - dialogsum-ja (chat-smr) - length: 20.28k - xlsum (doc-smr) - length: 8.89k
提供机构:
wisenut-nlp-team
原始信息汇总

数据集概述

配置名称:chat

  • 特征:
    • instruction: 字符串
    • input: 字符串
    • output: 字符串
  • 分割:
    • train:
      • 字节数: 39405392
      • 示例数: 17120
  • 数据集大小: 39405392字节

配置名称:multiple

  • 特征:
    • instruction: 字符串
    • input: 字符串
    • output: 字符串
  • 分割:
    • train:
      • 字节数: 2336693449
      • 示例数: 3285890
  • 数据集大小: 2336693449字节

配置名称:qa

  • 特征:
    • instruction: 字符串
    • input: 字符串
    • output: 字符串
  • 分割:
    • train:
      • 字节数: 1145861285
      • 示例数: 142021
  • 数据集大小: 1145861285字节

配置名称:smr

  • 特征:
    • instruction: 字符串
    • input: 字符串
    • output: 字符串
  • 分割:
    • train:
      • 字节数: 69198207
      • 示例数: 29168
  • 数据集大小: 69198207字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作