Amirjalaly/articles_fegh
收藏Hugging Face2024-02-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Amirjalaly/articles_fegh
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: source_domain
dtype: string
- name: language_score
dtype: string
- name: original_length
dtype: int64
- name: title
dtype: string
- name: nlines
dtype: int64
- name: url
dtype: string
- name: language
dtype: string
- name: type
dtype: string
- name: date_download
dtype: string
- name: perplexity
dtype: string
- name: original_nlines
dtype: int64
- name: length
dtype: int64
- name: raw_content
dtype: string
splits:
- name: article1_fegh
num_bytes: 141074171
num_examples: 10000
- name: article2_fegh
num_bytes: 137000022
num_examples: 10000
- name: article3_fegh
num_bytes: 158128322
num_examples: 10000
- name: article4_fegh
num_bytes: 184263990
num_examples: 10000
- name: article5_fegh
num_bytes: 110405776
num_examples: 10000
- name: article6_fegh
num_bytes: 121282566
num_examples: 10000
- name: article7_fegh
num_bytes: 154946135
num_examples: 10000
- name: article8_fegh
num_bytes: 99884898
num_examples: 10000
- name: article9_fegh
num_bytes: 127426232
num_examples: 10000
- name: article10_fegh
num_bytes: 131785193
num_examples: 10000
- name: article11_fegh
num_bytes: 159712660
num_examples: 10000
- name: article12_fegh
num_bytes: 192085112
num_examples: 10000
- name: article13_fegh
num_bytes: 150810425
num_examples: 10000
- name: article14_fegh
num_bytes: 221236879
num_examples: 10000
- name: article15_fegh
num_bytes: 142548442
num_examples: 10000
- name: article16_fegh
num_bytes: 187090171
num_examples: 10000
- name: article17_fegh
num_bytes: 237446954
num_examples: 10000
- name: article18_fegh
num_bytes: 120121732
num_examples: 10000
- name: article19_fegh
num_bytes: 128613349
num_examples: 10000
- name: article20_fegh
num_bytes: 102115059
num_examples: 10000
- name: article21_fegh
num_bytes: 164741962
num_examples: 10000
- name: article22_fegh
num_bytes: 151829912
num_examples: 10000
- name: article23_fegh
num_bytes: 140526953
num_examples: 10000
- name: article24_fegh
num_bytes: 169421145
num_examples: 10000
- name: article25_fegh
num_bytes: 108889265
num_examples: 10000
- name: article26_fegh
num_bytes: 107474567
num_examples: 10000
- name: article27_fegh
num_bytes: 173530923
num_examples: 10000
- name: article28_fegh
num_bytes: 113164137
num_examples: 8273
download_size: 835265147
dataset_size: 4137556952
configs:
- config_name: default
data_files:
- split: article1_fegh
path: data/article1_fegh-*
- split: article2_fegh
path: data/article2_fegh-*
- split: article3_fegh
path: data/article3_fegh-*
- split: article4_fegh
path: data/article4_fegh-*
- split: article5_fegh
path: data/article5_fegh-*
- split: article6_fegh
path: data/article6_fegh-*
- split: article7_fegh
path: data/article7_fegh-*
- split: article8_fegh
path: data/article8_fegh-*
- split: article9_fegh
path: data/article9_fegh-*
- split: article10_fegh
path: data/article10_fegh-*
- split: article11_fegh
path: data/article11_fegh-*
- split: article12_fegh
path: data/article12_fegh-*
- split: article13_fegh
path: data/article13_fegh-*
- split: article14_fegh
path: data/article14_fegh-*
- split: article15_fegh
path: data/article15_fegh-*
- split: article16_fegh
path: data/article16_fegh-*
- split: article17_fegh
path: data/article17_fegh-*
- split: article18_fegh
path: data/article18_fegh-*
- split: article19_fegh
path: data/article19_fegh-*
- split: article20_fegh
path: data/article20_fegh-*
- split: article21_fegh
path: data/article21_fegh-*
- split: article22_fegh
path: data/article22_fegh-*
- split: article23_fegh
path: data/article23_fegh-*
- split: article24_fegh
path: data/article24_fegh-*
- split: article25_fegh
path: data/article25_fegh-*
- split: article26_fegh
path: data/article26_fegh-*
- split: article27_fegh
path: data/article27_fegh-*
- split: article28_fegh
path: data/article28_fegh-*
---
The dataset consists of multiple article segments, each with specified byte size and number of examples. Features of the dataset include source domain, language score, original length, title, number of lines, URL, language, type, download date, perplexity, original number of lines, length, and raw content. The total size of the dataset is 4137556952 bytes, with a download size of 835265147 bytes.
提供机构:
Amirjalaly
原始信息汇总
数据集特征
- source_domain: 字符串类型
- language_score: 字符串类型
- original_length: 64位整数类型
- title: 字符串类型
- nlines: 64位整数类型
- url: 字符串类型
- language: 字符串类型
- type: 字符串类型
- date_download: 字符串类型
- perplexity: 字符串类型
- original_nlines: 64位整数类型
- length: 64位整数类型
- raw_content: 字符串类型
数据集分割
- article1_fegh: 141,074,171字节,10,000个样本
- article2_fegh: 137,000,022字节,10,000个样本
- article3_fegh: 158,128,322字节,10,000个样本
- article4_fegh: 184,263,990字节,10,000个样本
- article5_fegh: 110,405,776字节,10,000个样本
- article6_fegh: 121,282,566字节,10,000个样本
- article7_fegh: 154,946,135字节,10,000个样本
- article8_fegh: 99,884,898字节,10,000个样本
- article9_fegh: 127,426,232字节,10,000个样本
- article10_fegh: 131,785,193字节,10,000个样本
- article11_fegh: 159,712,660字节,10,000个样本
- article12_fegh: 192,085,112字节,10,000个样本
- article13_fegh: 150,810,425字节,10,000个样本
- article14_fegh: 221,236,879字节,10,000个样本
- article15_fegh: 142,548,442字节,10,000个样本
- article16_fegh: 187,090,171字节,10,000个样本
- article17_fegh: 237,446,954字节,10,000个样本
- article18_fegh: 120,121,732字节,10,000个样本
- article19_fegh: 128,613,349字节,10,000个样本
- article20_fegh: 102,115,059字节,10,000个样本
- article21_fegh: 164,741,962字节,10,000个样本
- article22_fegh: 151,829,912字节,10,000个样本
- article23_fegh: 140,526,953字节,10,000个样本
- article24_fegh: 169,421,145字节,10,000个样本
- article25_fegh: 108,889,265字节,10,000个样本
- article26_fegh: 107,474,567字节,10,000个样本
- article27_fegh: 173,530,923字节,10,000个样本
- article28_fegh: 113,164,137字节,8,273个样本
数据集大小
- 下载大小: 835,265,147字节
- 数据集大小: 4,137,556,952字节
配置
- config_name: default
- data_files:
- article1_fegh: data/article1_fegh-*
- article2_fegh: data/article2_fegh-*
- article3_fegh: data/article3_fegh-*
- article4_fegh: data/article4_fegh-*
- article5_fegh: data/article5_fegh-*
- article6_fegh: data/article6_fegh-*
- article7_fegh: data/article7_fegh-*
- article8_fegh: data/article8_fegh-*
- article9_fegh: data/article9_fegh-*
- article10_fegh: data/article10_fegh-*
- article11_fegh: data/article11_fegh-*
- article12_fegh: data/article12_fegh-*
- article13_fegh: data/article13_fegh-*
- article14_fegh: data/article14_fegh-*
- article15_fegh: data/article15_fegh-*
- article16_fegh: data/article16_fegh-*
- article17_fegh: data/article17_fegh-*
- article18_fegh: data/article18_fegh-*
- article19_fegh: data/article19_fegh-*
- article20_fegh: data/article20_fegh-*
- article21_fegh: data/article21_fegh-*
- article22_fegh: data/article22_fegh-*
- article23_fegh: data/article23_fegh-*
- article24_fegh: data/article24_fegh-*
- article25_fegh: data/article25_fegh-*
- article26_fegh: data/article26_fegh-*
- article27_fegh: data/article27_fegh-*
- article28_fegh: data/article28_fegh-*
- data_files:



