five

Amirjalaly/articles_fegh

收藏
Hugging Face2024-02-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Amirjalaly/articles_fegh
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: source_domain dtype: string - name: language_score dtype: string - name: original_length dtype: int64 - name: title dtype: string - name: nlines dtype: int64 - name: url dtype: string - name: language dtype: string - name: type dtype: string - name: date_download dtype: string - name: perplexity dtype: string - name: original_nlines dtype: int64 - name: length dtype: int64 - name: raw_content dtype: string splits: - name: article1_fegh num_bytes: 141074171 num_examples: 10000 - name: article2_fegh num_bytes: 137000022 num_examples: 10000 - name: article3_fegh num_bytes: 158128322 num_examples: 10000 - name: article4_fegh num_bytes: 184263990 num_examples: 10000 - name: article5_fegh num_bytes: 110405776 num_examples: 10000 - name: article6_fegh num_bytes: 121282566 num_examples: 10000 - name: article7_fegh num_bytes: 154946135 num_examples: 10000 - name: article8_fegh num_bytes: 99884898 num_examples: 10000 - name: article9_fegh num_bytes: 127426232 num_examples: 10000 - name: article10_fegh num_bytes: 131785193 num_examples: 10000 - name: article11_fegh num_bytes: 159712660 num_examples: 10000 - name: article12_fegh num_bytes: 192085112 num_examples: 10000 - name: article13_fegh num_bytes: 150810425 num_examples: 10000 - name: article14_fegh num_bytes: 221236879 num_examples: 10000 - name: article15_fegh num_bytes: 142548442 num_examples: 10000 - name: article16_fegh num_bytes: 187090171 num_examples: 10000 - name: article17_fegh num_bytes: 237446954 num_examples: 10000 - name: article18_fegh num_bytes: 120121732 num_examples: 10000 - name: article19_fegh num_bytes: 128613349 num_examples: 10000 - name: article20_fegh num_bytes: 102115059 num_examples: 10000 - name: article21_fegh num_bytes: 164741962 num_examples: 10000 - name: article22_fegh num_bytes: 151829912 num_examples: 10000 - name: article23_fegh num_bytes: 140526953 num_examples: 10000 - name: article24_fegh num_bytes: 169421145 num_examples: 10000 - name: article25_fegh num_bytes: 108889265 num_examples: 10000 - name: article26_fegh num_bytes: 107474567 num_examples: 10000 - name: article27_fegh num_bytes: 173530923 num_examples: 10000 - name: article28_fegh num_bytes: 113164137 num_examples: 8273 download_size: 835265147 dataset_size: 4137556952 configs: - config_name: default data_files: - split: article1_fegh path: data/article1_fegh-* - split: article2_fegh path: data/article2_fegh-* - split: article3_fegh path: data/article3_fegh-* - split: article4_fegh path: data/article4_fegh-* - split: article5_fegh path: data/article5_fegh-* - split: article6_fegh path: data/article6_fegh-* - split: article7_fegh path: data/article7_fegh-* - split: article8_fegh path: data/article8_fegh-* - split: article9_fegh path: data/article9_fegh-* - split: article10_fegh path: data/article10_fegh-* - split: article11_fegh path: data/article11_fegh-* - split: article12_fegh path: data/article12_fegh-* - split: article13_fegh path: data/article13_fegh-* - split: article14_fegh path: data/article14_fegh-* - split: article15_fegh path: data/article15_fegh-* - split: article16_fegh path: data/article16_fegh-* - split: article17_fegh path: data/article17_fegh-* - split: article18_fegh path: data/article18_fegh-* - split: article19_fegh path: data/article19_fegh-* - split: article20_fegh path: data/article20_fegh-* - split: article21_fegh path: data/article21_fegh-* - split: article22_fegh path: data/article22_fegh-* - split: article23_fegh path: data/article23_fegh-* - split: article24_fegh path: data/article24_fegh-* - split: article25_fegh path: data/article25_fegh-* - split: article26_fegh path: data/article26_fegh-* - split: article27_fegh path: data/article27_fegh-* - split: article28_fegh path: data/article28_fegh-* ---

The dataset consists of multiple article segments, each with specified byte size and number of examples. Features of the dataset include source domain, language score, original length, title, number of lines, URL, language, type, download date, perplexity, original number of lines, length, and raw content. The total size of the dataset is 4137556952 bytes, with a download size of 835265147 bytes.
提供机构:
Amirjalaly
原始信息汇总

数据集特征

  • source_domain: 字符串类型
  • language_score: 字符串类型
  • original_length: 64位整数类型
  • title: 字符串类型
  • nlines: 64位整数类型
  • url: 字符串类型
  • language: 字符串类型
  • type: 字符串类型
  • date_download: 字符串类型
  • perplexity: 字符串类型
  • original_nlines: 64位整数类型
  • length: 64位整数类型
  • raw_content: 字符串类型

数据集分割

  • article1_fegh: 141,074,171字节,10,000个样本
  • article2_fegh: 137,000,022字节,10,000个样本
  • article3_fegh: 158,128,322字节,10,000个样本
  • article4_fegh: 184,263,990字节,10,000个样本
  • article5_fegh: 110,405,776字节,10,000个样本
  • article6_fegh: 121,282,566字节,10,000个样本
  • article7_fegh: 154,946,135字节,10,000个样本
  • article8_fegh: 99,884,898字节,10,000个样本
  • article9_fegh: 127,426,232字节,10,000个样本
  • article10_fegh: 131,785,193字节,10,000个样本
  • article11_fegh: 159,712,660字节,10,000个样本
  • article12_fegh: 192,085,112字节,10,000个样本
  • article13_fegh: 150,810,425字节,10,000个样本
  • article14_fegh: 221,236,879字节,10,000个样本
  • article15_fegh: 142,548,442字节,10,000个样本
  • article16_fegh: 187,090,171字节,10,000个样本
  • article17_fegh: 237,446,954字节,10,000个样本
  • article18_fegh: 120,121,732字节,10,000个样本
  • article19_fegh: 128,613,349字节,10,000个样本
  • article20_fegh: 102,115,059字节,10,000个样本
  • article21_fegh: 164,741,962字节,10,000个样本
  • article22_fegh: 151,829,912字节,10,000个样本
  • article23_fegh: 140,526,953字节,10,000个样本
  • article24_fegh: 169,421,145字节,10,000个样本
  • article25_fegh: 108,889,265字节,10,000个样本
  • article26_fegh: 107,474,567字节,10,000个样本
  • article27_fegh: 173,530,923字节,10,000个样本
  • article28_fegh: 113,164,137字节,8,273个样本

数据集大小

  • 下载大小: 835,265,147字节
  • 数据集大小: 4,137,556,952字节

配置

  • config_name: default
    • data_files:
      • article1_fegh: data/article1_fegh-*
      • article2_fegh: data/article2_fegh-*
      • article3_fegh: data/article3_fegh-*
      • article4_fegh: data/article4_fegh-*
      • article5_fegh: data/article5_fegh-*
      • article6_fegh: data/article6_fegh-*
      • article7_fegh: data/article7_fegh-*
      • article8_fegh: data/article8_fegh-*
      • article9_fegh: data/article9_fegh-*
      • article10_fegh: data/article10_fegh-*
      • article11_fegh: data/article11_fegh-*
      • article12_fegh: data/article12_fegh-*
      • article13_fegh: data/article13_fegh-*
      • article14_fegh: data/article14_fegh-*
      • article15_fegh: data/article15_fegh-*
      • article16_fegh: data/article16_fegh-*
      • article17_fegh: data/article17_fegh-*
      • article18_fegh: data/article18_fegh-*
      • article19_fegh: data/article19_fegh-*
      • article20_fegh: data/article20_fegh-*
      • article21_fegh: data/article21_fegh-*
      • article22_fegh: data/article22_fegh-*
      • article23_fegh: data/article23_fegh-*
      • article24_fegh: data/article24_fegh-*
      • article25_fegh: data/article25_fegh-*
      • article26_fegh: data/article26_fegh-*
      • article27_fegh: data/article27_fegh-*
      • article28_fegh: data/article28_fegh-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作