ellamind/wikipedia-2023-11-retrieval-multilingual-corpus
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ellamind/wikipedia-2023-11-retrieval-multilingual-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bg
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 9681710
num_examples: 13500
download_size: 4633852
dataset_size: 9681710
- config_name: bn
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 14694766
num_examples: 13500
download_size: 5529387
dataset_size: 14694766
- config_name: cs
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 6094140
num_examples: 13500
download_size: 3950951
dataset_size: 6094140
- config_name: da
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5321047
num_examples: 13500
download_size: 3212721
dataset_size: 5321047
- config_name: de
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 6062342
num_examples: 13500
download_size: 3637796
dataset_size: 6062342
- config_name: en
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 6677737
num_examples: 13500
download_size: 3998998
dataset_size: 6677737
- config_name: fa
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 9038000
num_examples: 13500
download_size: 4263332
dataset_size: 9038000
- config_name: fi
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5885015
num_examples: 13500
download_size: 3532409
dataset_size: 5885015
- config_name: hi
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 14879843
num_examples: 13500
download_size: 5629118
dataset_size: 14879843
- config_name: it
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5905698
num_examples: 13500
download_size: 3604209
dataset_size: 5905698
- config_name: nl
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5647998
num_examples: 13500
download_size: 3295722
dataset_size: 5647998
- config_name: 'no'
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5616224
num_examples: 13500
download_size: 3406110
dataset_size: 5616224
- config_name: pt
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 6080375
num_examples: 13500
download_size: 3690233
dataset_size: 6080375
- config_name: ro
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5482624
num_examples: 13500
download_size: 3346015
dataset_size: 5482624
- config_name: sr
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 9445283
num_examples: 13500
download_size: 4775552
dataset_size: 9445283
- config_name: sv
features:
- name: _id
dtype: string
- name: title
dtype: string
- name: text
dtype: string
splits:
- name: test
num_bytes: 5741282
num_examples: 13500
download_size: 3424342
dataset_size: 5741282
configs:
- config_name: bg
data_files:
- split: test
path: bg/test-*
- config_name: bn
data_files:
- split: test
path: bn/test-*
- config_name: cs
data_files:
- split: test
path: cs/test-*
- config_name: da
data_files:
- split: test
path: da/test-*
- config_name: de
data_files:
- split: test
path: de/test-*
- config_name: en
data_files:
- split: test
path: en/test-*
- config_name: fa
data_files:
- split: test
path: fa/test-*
- config_name: fi
data_files:
- split: test
path: fi/test-*
- config_name: hi
data_files:
- split: test
path: hi/test-*
- config_name: it
data_files:
- split: test
path: it/test-*
- config_name: nl
data_files:
- split: test
path: nl/test-*
- config_name: 'no'
data_files:
- split: test
path: no/test-*
- config_name: pt
data_files:
- split: test
path: pt/test-*
- config_name: ro
data_files:
- split: test
path: ro/test-*
- config_name: sr
data_files:
- split: test
path: sr/test-*
- config_name: sv
data_files:
- split: test
path: sv/test-*
---
提供机构:
ellamind
原始信息汇总
数据集概述
数据集配置信息
| 配置名称 | 特征 | 分割 | 字节数 | 示例数 | 下载大小 | 数据集大小 |
|---|---|---|---|---|---|---|
| bg | _id: string, title: string, text: string | test | 9681710 | 13500 | 4633852 | 9681710 |
| bn | _id: string, title: string, text: string | test | 14694766 | 13500 | 5529387 | 14694766 |
| cs | _id: string, title: string, text: string | test | 6094140 | 13500 | 3950951 | 6094140 |
| da | _id: string, title: string, text: string | test | 5321047 | 13500 | 3212721 | 5321047 |
| de | _id: string, title: string, text: string | test | 6062342 | 13500 | 3637796 | 6062342 |
| en | _id: string, title: string, text: string | test | 6677737 | 13500 | 3998998 | 6677737 |
| fa | _id: string, title: string, text: string | test | 9038000 | 13500 | 4263332 | 9038000 |
| fi | _id: string, title: string, text: string | test | 5885015 | 13500 | 3532409 | 5885015 |
| hi | _id: string, title: string, text: string | test | 14879843 | 13500 | 5629118 | 14879843 |
| it | _id: string, title: string, text: string | test | 5905698 | 13500 | 3604209 | 5905698 |
| nl | _id: string, title: string, text: string | test | 5647998 | 13500 | 3295722 | 5647998 |
| no | _id: string, title: string, text: string | test | 5616224 | 13500 | 3406110 | 5616224 |
| pt | _id: string, title: string, text: string | test | 6080375 | 13500 | 3690233 | 6080375 |
| ro | _id: string, title: string, text: string | test | 5482624 | 13500 | 3346015 | 5482624 |
| sr | _id: string, title: string, text: string | test | 9445283 | 13500 | 4775552 | 9445283 |
| sv | _id: string, title: string, text: string | test | 5741282 | 13500 | 3424342 | 5741282 |
数据文件路径
| 配置名称 | 分割 | 路径 |
|---|---|---|
| bg | test | bg/test-* |
| bn | test | bn/test-* |
| cs | test | cs/test-* |
| da | test | da/test-* |
| de | test | de/test-* |
| en | test | en/test-* |
| fa | test | fa/test-* |
| fi | test | fi/test-* |
| hi | test | hi/test-* |
| it | test | it/test-* |
| nl | test | nl/test-* |
| no | test | no/test-* |
| pt | test | pt/test-* |
| ro | test | ro/test-* |
| sr | test | sr/test-* |
| sv | test | sv/test-* |



