five

ellamind/wikipedia-2023-11-retrieval-multilingual-corpus

收藏
Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ellamind/wikipedia-2023-11-retrieval-multilingual-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: bg features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 9681710 num_examples: 13500 download_size: 4633852 dataset_size: 9681710 - config_name: bn features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 14694766 num_examples: 13500 download_size: 5529387 dataset_size: 14694766 - config_name: cs features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 6094140 num_examples: 13500 download_size: 3950951 dataset_size: 6094140 - config_name: da features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5321047 num_examples: 13500 download_size: 3212721 dataset_size: 5321047 - config_name: de features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 6062342 num_examples: 13500 download_size: 3637796 dataset_size: 6062342 - config_name: en features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 6677737 num_examples: 13500 download_size: 3998998 dataset_size: 6677737 - config_name: fa features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 9038000 num_examples: 13500 download_size: 4263332 dataset_size: 9038000 - config_name: fi features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5885015 num_examples: 13500 download_size: 3532409 dataset_size: 5885015 - config_name: hi features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 14879843 num_examples: 13500 download_size: 5629118 dataset_size: 14879843 - config_name: it features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5905698 num_examples: 13500 download_size: 3604209 dataset_size: 5905698 - config_name: nl features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5647998 num_examples: 13500 download_size: 3295722 dataset_size: 5647998 - config_name: 'no' features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5616224 num_examples: 13500 download_size: 3406110 dataset_size: 5616224 - config_name: pt features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 6080375 num_examples: 13500 download_size: 3690233 dataset_size: 6080375 - config_name: ro features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5482624 num_examples: 13500 download_size: 3346015 dataset_size: 5482624 - config_name: sr features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 9445283 num_examples: 13500 download_size: 4775552 dataset_size: 9445283 - config_name: sv features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: test num_bytes: 5741282 num_examples: 13500 download_size: 3424342 dataset_size: 5741282 configs: - config_name: bg data_files: - split: test path: bg/test-* - config_name: bn data_files: - split: test path: bn/test-* - config_name: cs data_files: - split: test path: cs/test-* - config_name: da data_files: - split: test path: da/test-* - config_name: de data_files: - split: test path: de/test-* - config_name: en data_files: - split: test path: en/test-* - config_name: fa data_files: - split: test path: fa/test-* - config_name: fi data_files: - split: test path: fi/test-* - config_name: hi data_files: - split: test path: hi/test-* - config_name: it data_files: - split: test path: it/test-* - config_name: nl data_files: - split: test path: nl/test-* - config_name: 'no' data_files: - split: test path: no/test-* - config_name: pt data_files: - split: test path: pt/test-* - config_name: ro data_files: - split: test path: ro/test-* - config_name: sr data_files: - split: test path: sr/test-* - config_name: sv data_files: - split: test path: sv/test-* ---
提供机构:
ellamind
原始信息汇总

数据集概述

数据集配置信息

配置名称 特征 分割 字节数 示例数 下载大小 数据集大小
bg _id: string, title: string, text: string test 9681710 13500 4633852 9681710
bn _id: string, title: string, text: string test 14694766 13500 5529387 14694766
cs _id: string, title: string, text: string test 6094140 13500 3950951 6094140
da _id: string, title: string, text: string test 5321047 13500 3212721 5321047
de _id: string, title: string, text: string test 6062342 13500 3637796 6062342
en _id: string, title: string, text: string test 6677737 13500 3998998 6677737
fa _id: string, title: string, text: string test 9038000 13500 4263332 9038000
fi _id: string, title: string, text: string test 5885015 13500 3532409 5885015
hi _id: string, title: string, text: string test 14879843 13500 5629118 14879843
it _id: string, title: string, text: string test 5905698 13500 3604209 5905698
nl _id: string, title: string, text: string test 5647998 13500 3295722 5647998
no _id: string, title: string, text: string test 5616224 13500 3406110 5616224
pt _id: string, title: string, text: string test 6080375 13500 3690233 6080375
ro _id: string, title: string, text: string test 5482624 13500 3346015 5482624
sr _id: string, title: string, text: string test 9445283 13500 4775552 9445283
sv _id: string, title: string, text: string test 5741282 13500 3424342 5741282

数据文件路径

配置名称 分割 路径
bg test bg/test-*
bn test bn/test-*
cs test cs/test-*
da test da/test-*
de test de/test-*
en test en/test-*
fa test fa/test-*
fi test fi/test-*
hi test hi/test-*
it test it/test-*
nl test nl/test-*
no test no/test-*
pt test pt/test-*
ro test ro/test-*
sr test sr/test-*
sv test sv/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作