WikiQuality/unique_words_hi
收藏Hugging Face2024-09-03 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/WikiQuality/unique_words_hi
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: am
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 720421.4949770134
num_examples: 404
- name: test
num_bytes: 39230.87348884727
num_examples: 22
download_size: 4583551
dataset_size: 759652.3684658607
- config_name: ary
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 2012453.1502145922
num_examples: 1376
- name: test
num_bytes: 106765.31974248927
num_examples: 73
download_size: 2616804
dataset_size: 2119218.4699570816
- config_name: bm
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 27316.020671834627
num_examples: 40
- name: test
num_bytes: 2048.701550387597
num_examples: 3
download_size: 157121
dataset_size: 29364.722222222223
- config_name: ee
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 58804.97458563536
num_examples: 67
- name: test
num_bytes: 3510.7447513812153
num_examples: 4
download_size: 239041
dataset_size: 62315.71933701658
- config_name: fon
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 96569.83686786296
num_examples: 113
- name: test
num_bytes: 5127.601957585644
num_examples: 6
download_size: 161016
dataset_size: 101697.43882544861
- config_name: ha
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 8000142.076756886
num_examples: 3730
- name: test
num_bytes: 422527.61102442537
num_examples: 197
download_size: 19840513
dataset_size: 8422669.687781312
- config_name: ig
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 8606368.581420058
num_examples: 3027
- name: test
num_bytes: 454912.11530466116
num_examples: 160
download_size: 15853531
dataset_size: 9061280.69672472
- config_name: lg
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 565534.5700032185
num_examples: 291
- name: test
num_bytes: 31094.68426134535
num_examples: 16
download_size: 1671200
dataset_size: 596629.2542645638
- config_name: ln
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 120300.38545910372
num_examples: 210
- name: test
num_bytes: 6874.307740520213
num_examples: 12
download_size: 492568
dataset_size: 127174.69319962393
- config_name: ny
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 144807.3516819572
num_examples: 93
- name: test
num_bytes: 7785.341488277268
num_examples: 5
download_size: 432087
dataset_size: 152592.69317023447
- config_name: om
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 149623.33407325193
num_examples: 82
- name: test
num_bytes: 9123.374028856826
num_examples: 5
download_size: 874671
dataset_size: 158746.70810210877
- config_name: pcm
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 314917.86130136985
num_examples: 211
- name: test
num_bytes: 17910.020547945205
num_examples: 12
download_size: 556349
dataset_size: 332827.8818493151
- config_name: rn
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 29617.125
num_examples: 43
- name: test
num_bytes: 2066.311046511628
num_examples: 3
download_size: 152026
dataset_size: 31683.436046511626
- config_name: rw
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 1283346.1431797051
num_examples: 908
- name: test
num_bytes: 67842.08686412538
num_examples: 48
download_size: 2970448
dataset_size: 1351188.2300438306
- config_name: sn
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 1561078.3257964116
num_examples: 1879
- name: test
num_bytes: 82249.47006590992
num_examples: 99
download_size: 2588336
dataset_size: 1643327.7958623215
- config_name: so
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 860363.8802139037
num_examples: 599
- name: test
num_bytes: 45962.67807486631
num_examples: 32
download_size: 3436594
dataset_size: 906326.55828877
- config_name: sw
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 5185902.153885985
num_examples: 5013
- name: test
num_bytes: 273105.5592710752
num_examples: 264
download_size: 17252138
dataset_size: 5459007.71315706
- config_name: ti
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 13099.91825613079
num_examples: 7
- name: test
num_bytes: 1871.41689373297
num_examples: 1
download_size: 163220
dataset_size: 14971.33514986376
- config_name: tn
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 284903.82930107525
num_examples: 154
- name: test
num_bytes: 16650.22379032258
num_examples: 9
download_size: 745359
dataset_size: 301554.05309139786
- config_name: ts
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 88057.75886524822
num_examples: 80
- name: test
num_bytes: 5503.609929078014
num_examples: 5
download_size: 230279
dataset_size: 93561.36879432623
- config_name: tw
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 951432.9154441184
num_examples: 496
- name: test
num_bytes: 51791.711122966124
num_examples: 27
download_size: 1947763
dataset_size: 1003224.6265670846
- config_name: wo
features:
- name: url
dtype: string
- name: id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 58912.59222702036
num_examples: 32
- name: test
num_bytes: 3682.0370141887724
num_examples: 2
download_size: 881268
dataset_size: 62594.62924120913
- config_name: yo
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 916153.1494145199
num_examples: 837
- name: test
num_bytes: 49255.54566744731
num_examples: 45
download_size: 3546158
dataset_size: 965408.6950819672
- config_name: zu
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 221260.42545454545
num_examples: 334
- name: test
num_bytes: 11924.214545454546
num_examples: 18
download_size: 1682008
dataset_size: 233184.63999999998
configs:
- config_name: am
data_files:
- split: train
path: am/train-*
- split: test
path: am/test-*
- config_name: ary
data_files:
- split: train
path: ary/train-*
- split: test
path: ary/test-*
- config_name: bm
data_files:
- split: train
path: bm/train-*
- split: test
path: bm/test-*
- config_name: ee
data_files:
- split: train
path: ee/train-*
- split: test
path: ee/test-*
- config_name: fon
data_files:
- split: train
path: fon/train-*
- split: test
path: fon/test-*
- config_name: ha
data_files:
- split: train
path: ha/train-*
- split: test
path: ha/test-*
- config_name: ig
data_files:
- split: train
path: ig/train-*
- split: test
path: ig/test-*
- config_name: lg
data_files:
- split: train
path: lg/train-*
- split: test
path: lg/test-*
- config_name: ln
data_files:
- split: train
path: ln/train-*
- split: test
path: ln/test-*
- config_name: ny
data_files:
- split: train
path: ny/train-*
- split: test
path: ny/test-*
- config_name: om
data_files:
- split: train
path: om/train-*
- split: test
path: om/test-*
- config_name: pcm
data_files:
- split: train
path: pcm/train-*
- split: test
path: pcm/test-*
- config_name: rn
data_files:
- split: train
path: rn/train-*
- split: test
path: rn/test-*
- config_name: rw
data_files:
- split: train
path: rw/train-*
- split: test
path: rw/test-*
- config_name: sn
data_files:
- split: train
path: sn/train-*
- split: test
path: sn/test-*
- config_name: so
data_files:
- split: train
path: so/train-*
- split: test
path: so/test-*
- config_name: sw
data_files:
- split: train
path: sw/train-*
- split: test
path: sw/test-*
- config_name: ti
data_files:
- split: train
path: ti/train-*
- split: test
path: ti/test-*
- config_name: tn
data_files:
- split: train
path: tn/train-*
- split: test
path: tn/test-*
- config_name: ts
data_files:
- split: train
path: ts/train-*
- split: test
path: ts/test-*
- config_name: tw
data_files:
- split: train
path: tw/train-*
- split: test
path: tw/test-*
- config_name: wo
data_files:
- split: train
path: wo/train-*
- split: test
path: wo/test-*
- config_name: yo
data_files:
- split: train
path: yo/train-*
- split: test
path: yo/test-*
- config_name: zu
data_files:
- split: train
path: zu/train-*
- split: test
path: zu/test-*
---
提供机构:
WikiQuality
原始信息汇总
数据集概述
数据集配置
配置名称:ha
- 特征:
- id: string
- url: string
- title: string
- text: string
- 分割:
- train:
- 字节数: 7470726.9048663
- 样本数: 3563
- test:
- 字节数: 394189.3511408544
- 样本数: 188
- train:
- 下载大小: 19075652
- 数据集大小: 7864916.256007154
配置名称:ig
- 特征:
- id: string
- url: string
- title: string
- text: string
- 分割:
- train:
- 字节数: 8181391.122507122
- 样本数: 2872
- test:
- 字节数: 432998.415954416
- 样本数: 152
- train:
- 下载大小: 15213552
- 数据集大小: 8614389.538461538
配置名称:pcm
- 特征:
- id: string
- url: string
- title: string
- text: string
- 分割:
- train:
- 字节数: 294285.1666666667
- 样本数: 197
- test:
- 字节数: 16432.166666666668
- 样本数: 11
- train:
- 下载大小: 522593
- 数据集大小: 310717.3333333334
配置名称:sw
- 特征:
- id: string
- url: string
- title: string
- text: string
- 分割:
- train:
- 字节数: 5083209.065258449
- 样本数: 5583
- test:
- 字节数: 267681.07920221816
- 样本数: 294
- train:
- 下载大小: 17763796
- 数据集大小: 5350890.144460667
配置名称:yo
- 特征:
- id: string
- url: string
- title: string
- text: string
- 分割:
- train:
- 字节数: 349682.1884960159
- 样本数: 742
- test:
- 字节数: 18850.79183266932
- 样本数: 40
- train:
- 下载大小: 2994685
- 数据集大小: 368532.98032868525
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



