five

andrewatef/PText

收藏
Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/andrewatef/PText
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: articles features: - name: input dtype: string - name: output dtype: string - name: url dtype: string splits: - name: train num_bytes: 12376328.0 num_examples: 2040 download_size: 5623581 dataset_size: 12376328.0 - config_name: articles2 features: - name: title dtype: string - name: description dtype: string - name: reading_time_minutes dtype: int64 - name: tags dtype: string - name: body_markdown dtype: string splits: - name: train num_bytes: 2567410.0 num_examples: 1090 download_size: 1362235 dataset_size: 2567410.0 - config_name: llama features: - name: text dtype: string splits: - name: train num_bytes: 291896975.0 num_examples: 1257591 download_size: 153320452 dataset_size: 291896975.0 - config_name: llama2 features: - name: text dtype: string splits: - name: train num_bytes: 170086868.0 num_examples: 516177 download_size: 83326571 dataset_size: 170086868.0 - config_name: llama3 features: - name: Instruction dtype: string - name: Response dtype: string splits: - name: train num_bytes: 142729487.0 num_examples: 516177 download_size: 101890981 dataset_size: 142729487.0 - config_name: llama4 features: - name: text dtype: string splits: - name: train num_bytes: 157182443.0 num_examples: 516177 download_size: 82734120 dataset_size: 157182443.0 - config_name: llama5 features: - name: text dtype: string splits: - name: train num_bytes: 53373019.0 num_examples: 172059 download_size: 27923481 dataset_size: 53373019.0 - config_name: llama6 features: - name: input dtype: string - name: output dtype: string - name: instruction dtype: string splits: - name: train num_bytes: 51480370.0 num_examples: 172059 download_size: 33775616 dataset_size: 51480370.0 - config_name: llama7 features: - name: input dtype: string - name: output dtype: string - name: instruction dtype: string splits: - name: train num_bytes: 3759851.0 num_examples: 13530 download_size: 2287275 dataset_size: 3759851.0 - config_name: llama8 features: - name: input dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: instruction dtype: string splits: - name: train num_bytes: 101496004.9890677 num_examples: 120441 - name: test num_bytes: 43498649.0109323 num_examples: 51618 download_size: 74071830 dataset_size: 144994654.0 - config_name: phi2 features: - name: text dtype: string splits: - name: train num_bytes: 275548292.0 num_examples: 1257591 download_size: 151999212 dataset_size: 275548292.0 - config_name: summary features: - name: input dtype: string - name: output dtype: string - name: instruction dtype: string splits: - name: train num_bytes: 1252702430.0 num_examples: 287113 download_size: 771120161 dataset_size: 1252702430.0 - config_name: summary2 features: - name: document dtype: string - name: summary dtype: string - name: input dtype: string - name: output dtype: string - name: instruction dtype: string splits: - name: train num_bytes: 1117818826.0 num_examples: 44972 download_size: 648248844 dataset_size: 1117818826.0 configs: - config_name: articles data_files: - split: train path: articles/train-* - config_name: articles2 data_files: - split: train path: articles2/train-* - config_name: llama data_files: - split: train path: llama/train-* - config_name: llama2 data_files: - split: train path: llama2/train-* - config_name: llama3 data_files: - split: train path: llama3/train-* - config_name: llama4 data_files: - split: train path: llama4/train-* - config_name: llama5 data_files: - split: train path: llama5/train-* - config_name: llama6 data_files: - split: train path: llama6/train-* - config_name: llama7 data_files: - split: train path: llama7/train-* - config_name: llama8 data_files: - split: train path: llama8/train-* - split: test path: llama8/test-* - config_name: phi2 data_files: - split: train path: phi2/train-* - config_name: summary data_files: - split: train path: summary/train-* - config_name: summary2 data_files: - split: train path: summary2/train-* ---
提供机构:
andrewatef
原始信息汇总

数据集概述

数据集配置

配置名称:articles

  • 特征:
    • input: 字符串
    • output: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 12376328.0
      • 样本数: 2040
  • 下载大小: 5623581
  • 数据集大小: 12376328.0

配置名称:articles2

  • 特征:
    • title: 字符串
    • description: 字符串
    • reading_time_minutes: 整数
    • tags: 字符串
    • body_markdown: 字符串
  • 分割:
    • train:
      • 字节数: 2567410.0
      • 样本数: 1090
  • 下载大小: 1362235
  • 数据集大小: 2567410.0

配置名称:llama

  • 特征:
    • text: 字符串
  • 分割:
    • train:
      • 字节数: 291896975.0
      • 样本数: 1257591
  • 下载大小: 153320452
  • 数据集大小: 291896975.0

配置名称:llama2

  • 特征:
    • text: 字符串
  • 分割:
    • train:
      • 字节数: 170086868.0
      • 样本数: 516177
  • 下载大小: 83326571
  • 数据集大小: 170086868.0

配置名称:llama3

  • 特征:
    • Instruction: 字符串
    • Response: 字符串
  • 分割:
    • train:
      • 字节数: 142729487.0
      • 样本数: 516177
  • 下载大小: 101890981
  • 数据集大小: 142729487.0

配置名称:llama4

  • 特征:
    • text: 字符串
  • 分割:
    • train:
      • 字节数: 157182443.0
      • 样本数: 516177
  • 下载大小: 82734120
  • 数据集大小: 157182443.0

配置名称:llama5

  • 特征:
    • text: 字符串
  • 分割:
    • train:
      • 字节数: 53373019.0
      • 样本数: 172059
  • 下载大小: 27923481
  • 数据集大小: 53373019.0

配置名称:llama6

  • 特征:
    • input: 字符串
    • output: 字符串
    • instruction: 字符串
  • 分割:
    • train:
      • 字节数: 51480370.0
      • 样本数: 172059
  • 下载大小: 33775616
  • 数据集大小: 51480370.0

配置名称:llama7

  • 特征:
    • input: 字符串
    • output: 字符串
    • instruction: 字符串
  • 分割:
    • train:
      • 字节数: 3759851.0
      • 样本数: 13530
  • 下载大小: 2287275
  • 数据集大小: 3759851.0

配置名称:llama8

  • 特征:
    • input: 字符串
    • chosen: 字符串
    • rejected: 字符串
    • instruction: 字符串
  • 分割:
    • train:
      • 字节数: 101496004.9890677
      • 样本数: 120441
    • test:
      • 字节数: 43498649.0109323
      • 样本数: 51618
  • 下载大小: 74071830
  • 数据集大小: 144994654.0

配置名称:phi2

  • 特征:
    • text: 字符串
  • 分割:
    • train:
      • 字节数: 275548292.0
      • 样本数: 1257591
  • 下载大小: 151999212
  • 数据集大小: 275548292.0

配置名称:summary

  • 特征:
    • input: 字符串
    • output: 字符串
    • instruction: 字符串
  • 分割:
    • train:
      • 字节数: 1252702430.0
      • 样本数: 287113
  • 下载大小: 771120161
  • 数据集大小: 1252702430.0

配置名称:summary2

  • 特征:
    • document: 字符串
    • summary: 字符串
    • input: 字符串
    • output: 字符串
    • instruction: 字符串
  • 分割:
    • train:
      • 字节数: 1117818826.0
      • 样本数: 44972
  • 下载大小: 648248844
  • 数据集大小: 1117818826.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作