five

hqfx/llama3_generate

收藏
Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hqfx/llama3_generate
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: functions dtype: string - name: conversation list: - name: content dtype: string - name: role dtype: string splits: - name: hqfx_tulu_v2.code_alpaca num_bytes: 1119070 num_examples: 2001 - name: hqfx_tulu_v2.oasst1 num_bytes: 581675 num_examples: 733 - name: hqfx_tulu_v2.wizardlm num_bytes: 5080987 num_examples: 2981 - name: bz_arc13_alpaca_gpt4_chinese.train num_bytes: 3208554 num_examples: 4996 - name: hqfx_tulu_v2.sharegpt num_bytes: 10657388 num_examples: 7430 - name: hqfx_tulu_v2.science.qasper_truncated_4000 num_bytes: 3527939 num_examples: 221 - name: bz_arc13_wild_chat_en_zh_dedup_v2.Chinese num_bytes: 13439068 num_examples: 8712 - name: hqfx_tulu_v2.science.scitldr_aic num_bytes: 1347771 num_examples: 195 - name: hqfx_tulu_v2.lima num_bytes: 139437 num_examples: 101 - name: hqfx_tulu_v2.cot num_bytes: 5745023 num_examples: 4974 - name: hqfx_tulu_v2.science.scierc_relation num_bytes: 73168 num_examples: 34 - name: hqfx_tulu_v2.science.scierc_ner num_bytes: 60962 num_examples: 34 - name: hqfx_tulu_v2.science.evidence_inference num_bytes: 740828 num_examples: 167 - name: hqfx_tulu_v2.flan_v2 num_bytes: 11076334 num_examples: 4912 - name: hqfx_tulu_v2.science.scifact_json num_bytes: 240174 num_examples: 91 download_size: 30974924 dataset_size: 57038378 configs: - config_name: default data_files: - split: hqfx_tulu_v2.code_alpaca path: data/hqfx_tulu_v2.code_alpaca-* - split: hqfx_tulu_v2.oasst1 path: data/hqfx_tulu_v2.oasst1-* - split: hqfx_tulu_v2.wizardlm path: data/hqfx_tulu_v2.wizardlm-* - split: bz_arc13_alpaca_gpt4_chinese.train path: data/bz_arc13_alpaca_gpt4_chinese.train-* - split: hqfx_tulu_v2.sharegpt path: data/hqfx_tulu_v2.sharegpt-* - split: hqfx_tulu_v2.science.qasper_truncated_4000 path: data/hqfx_tulu_v2.science.qasper_truncated_4000-* - split: bz_arc13_wild_chat_en_zh_dedup_v2.Chinese path: data/bz_arc13_wild_chat_en_zh_dedup_v2.Chinese-* - split: hqfx_tulu_v2.science.scitldr_aic path: data/hqfx_tulu_v2.science.scitldr_aic-* - split: hqfx_tulu_v2.lima path: data/hqfx_tulu_v2.lima-* - split: hqfx_tulu_v2.cot path: data/hqfx_tulu_v2.cot-* - split: hqfx_tulu_v2.science.scierc_relation path: data/hqfx_tulu_v2.science.scierc_relation-* - split: hqfx_tulu_v2.science.scierc_ner path: data/hqfx_tulu_v2.science.scierc_ner-* - split: hqfx_tulu_v2.science.evidence_inference path: data/hqfx_tulu_v2.science.evidence_inference-* - split: hqfx_tulu_v2.flan_v2 path: data/hqfx_tulu_v2.flan_v2-* - split: hqfx_tulu_v2.science.scifact_json path: data/hqfx_tulu_v2.science.scifact_json-* ---
提供机构:
hqfx
原始信息汇总

数据集概述

数据集特征

  • functions: 数据类型为字符串。
  • conversation: 包含以下子特征:
    • content: 数据类型为字符串。
    • role: 数据类型为字符串。

数据集划分

  • hqfx_tulu_v2.code_alpaca: 字节数为1119070,样本数为2001。
  • hqfx_tulu_v2.oasst1: 字节数为581675,样本数为733。
  • hqfx_tulu_v2.wizardlm: 字节数为5080987,样本数为2981。
  • bz_arc13_alpaca_gpt4_chinese.train: 字节数为3208554,样本数为4996。
  • hqfx_tulu_v2.sharegpt: 字节数为10657388,样本数为7430。
  • hqfx_tulu_v2.science.qasper_truncated_4000: 字节数为3527939,样本数为221。
  • bz_arc13_wild_chat_en_zh_dedup_v2.Chinese: 字节数为13439068,样本数为8712。
  • hqfx_tulu_v2.science.scitldr_aic: 字节数为1347771,样本数为195。
  • hqfx_tulu_v2.lima: 字节数为139437,样本数为101。
  • hqfx_tulu_v2.cot: 字节数为5745023,样本数为4974。
  • hqfx_tulu_v2.science.scierc_relation: 字节数为73168,样本数为34。
  • hqfx_tulu_v2.science.scierc_ner: 字节数为60962,样本数为34。
  • hqfx_tulu_v2.science.evidence_inference: 字节数为740828,样本数为167。
  • hqfx_tulu_v2.flan_v2: 字节数为11076334,样本数为4912。
  • hqfx_tulu_v2.science.scifact_json: 字节数为240174,样本数为91。

数据集大小

  • 下载大小: 30974924字节
  • 数据集大小: 57038378字节

配置信息

  • 配置名称: default
    • 数据文件路径:
      • hqfx_tulu_v2.code_alpaca: data/hqfx_tulu_v2.code_alpaca-*
      • hqfx_tulu_v2.oasst1: data/hqfx_tulu_v2.oasst1-*
      • hqfx_tulu_v2.wizardlm: data/hqfx_tulu_v2.wizardlm-*
      • bz_arc13_alpaca_gpt4_chinese.train: data/bz_arc13_alpaca_gpt4_chinese.train-*
      • hqfx_tulu_v2.sharegpt: data/hqfx_tulu_v2.sharegpt-*
      • hqfx_tulu_v2.science.qasper_truncated_4000: data/hqfx_tulu_v2.science.qasper_truncated_4000-*
      • bz_arc13_wild_chat_en_zh_dedup_v2.Chinese: data/bz_arc13_wild_chat_en_zh_dedup_v2.Chinese-*
      • hqfx_tulu_v2.science.scitldr_aic: data/hqfx_tulu_v2.science.scitldr_aic-*
      • hqfx_tulu_v2.lima: data/hqfx_tulu_v2.lima-*
      • hqfx_tulu_v2.cot: data/hqfx_tulu_v2.cot-*
      • hqfx_tulu_v2.science.scierc_relation: data/hqfx_tulu_v2.science.scierc_relation-*
      • hqfx_tulu_v2.science.scierc_ner: data/hqfx_tulu_v2.science.scierc_ner-*
      • hqfx_tulu_v2.science.evidence_inference: data/hqfx_tulu_v2.science.evidence_inference-*
      • hqfx_tulu_v2.flan_v2: data/hqfx_tulu_v2.flan_v2-*
      • hqfx_tulu_v2.science.scifact_json: data/hqfx_tulu_v2.science.scifact_json-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作