manu/french-30b_separate
收藏Hugging Face2023-10-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/manu/french-30b_separate
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: WmtEnFrTest
path: data/WmtEnFrTest-*
- split: EnglishFrenchWebpagesScrapedTranslatedTest
path: data/EnglishFrenchWebpagesScrapedTranslatedTest-*
- split: FrenchLibrispeechTextOnlyTest
path: data/FrenchLibrispeechTextOnlyTest-*
- split: FrenchPodcastsTest
path: data/FrenchPodcastsTest-*
- split: FrenchOpenSubtitlesTest
path: data/FrenchOpenSubtitlesTest-*
- split: OriginalSongsLyricsWithFrenchTranslationTest
path: data/OriginalSongsLyricsWithFrenchTranslationTest-*
- split: ProjectgutenbergFrTest
path: data/ProjectgutenbergFrTest-*
- split: BnfGallicaTest
path: data/BnfGallicaTest-*
- split: ThesesFr20132023Test
path: data/ThesesFr20132023Test-*
- split: LegiOpendataTest
path: data/LegiOpendataTest-*
- split: BaloOpendataTest
path: data/BaloOpendataTest-*
- split: JadeOpendataTest
path: data/JadeOpendataTest-*
- split: DoleOpendataTest
path: data/DoleOpendataTest-*
- split: SardeOpendataTest
path: data/SardeOpendataTest-*
- split: QrOpendataTest
path: data/QrOpendataTest-*
- split: JorfOpendataTest
path: data/JorfOpendataTest-*
- split: IncaOpendataTest
path: data/IncaOpendataTest-*
- split: AccoOpendataTest
path: data/AccoOpendataTest-*
- split: KaliOpendataTest
path: data/KaliOpendataTest-*
- split: DebatsOpendataTest
path: data/DebatsOpendataTest-*
- split: CnilOpendataTest
path: data/CnilOpendataTest-*
- split: CappOpendataTest
path: data/CappOpendataTest-*
- split: CassOpendataTest
path: data/CassOpendataTest-*
- split: ConstitOpendataTest
path: data/ConstitOpendataTest-*
- split: IlluinLayoutDatasetTextOnlyTest
path: data/IlluinLayoutDatasetTextOnlyTest-*
- split: WikisourceFrTest
path: data/WikisourceFrTest-*
- split: Wikipedia20220301.frTest
path: data/Wikipedia20220301.frTest-*
- split: Oscar2301FrTest
path: data/Oscar2301FrTest-*
dataset_info:
features:
- name: id
dtype: string
- name: text
dtype: string
- name: dataset_id
dtype: string
splits:
- name: WmtEnFrTest
num_bytes: 933080
num_examples: 3003
- name: EnglishFrenchWebpagesScrapedTranslatedTest
num_bytes: 3557903
num_examples: 8580
- name: FrenchLibrispeechTextOnlyTest
num_bytes: 698968
num_examples: 2582
- name: FrenchPodcastsTest
num_bytes: 505018
num_examples: 100
- name: FrenchOpenSubtitlesTest
num_bytes: 3048714
num_examples: 100
- name: OriginalSongsLyricsWithFrenchTranslationTest
num_bytes: 2156145
num_examples: 756
- name: ProjectgutenbergFrTest
num_bytes: 39019119
num_examples: 100
- name: BnfGallicaTest
num_bytes: 43160730
num_examples: 100
- name: ThesesFr20132023Test
num_bytes: 3957037
num_examples: 959
- name: LegiOpendataTest
num_bytes: 16589963
num_examples: 10000
- name: BaloOpendataTest
num_bytes: 11094568
num_examples: 1355
- name: JadeOpendataTest
num_bytes: 56977150
num_examples: 5586
- name: DoleOpendataTest
num_bytes: 2065780
num_examples: 100
- name: SardeOpendataTest
num_bytes: 1044391
num_examples: 2244
- name: QrOpendataTest
num_bytes: 18924359
num_examples: 100
- name: JorfOpendataTest
num_bytes: 11892298
num_examples: 10000
- name: IncaOpendataTest
num_bytes: 27827026
num_examples: 3737
- name: AccoOpendataTest
num_bytes: 36928857
num_examples: 2541
- name: KaliOpendataTest
num_bytes: 7740933
num_examples: 4306
- name: DebatsOpendataTest
num_bytes: 38200789
num_examples: 100
- name: CnilOpendataTest
num_bytes: 1495015
num_examples: 181
- name: CappOpendataTest
num_bytes: 9680857
num_examples: 727
- name: CassOpendataTest
num_bytes: 8283986
num_examples: 1422
- name: ConstitOpendataTest
num_bytes: 1340350
num_examples: 100
- name: IlluinLayoutDatasetTextOnlyTest
num_bytes: 11714355
num_examples: 4885
- name: WikisourceFrTest
num_bytes: 44358940
num_examples: 10000
- name: Wikipedia20220301.frTest
num_bytes: 28814742
num_examples: 10000
- name: Oscar2301FrTest
num_bytes: 51030875
num_examples: 9834
download_size: 0
dataset_size: 483041948
---
# Dataset Card for "french-30b_separate"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
manu
原始信息汇总
数据集概述
数据集配置
- 配置名称: default
- 数据文件:
- WmtEnFrTest: 路径
data/WmtEnFrTest-* - EnglishFrenchWebpagesScrapedTranslatedTest: 路径
data/EnglishFrenchWebpagesScrapedTranslatedTest-* - FrenchLibrispeechTextOnlyTest: 路径
data/FrenchLibrispeechTextOnlyTest-* - FrenchPodcastsTest: 路径
data/FrenchPodcastsTest-* - FrenchOpenSubtitlesTest: 路径
data/FrenchOpenSubtitlesTest-* - OriginalSongsLyricsWithFrenchTranslationTest: 路径
data/OriginalSongsLyricsWithFrenchTranslationTest-* - ProjectgutenbergFrTest: 路径
data/ProjectgutenbergFrTest-* - BnfGallicaTest: 路径
data/BnfGallicaTest-* - ThesesFr20132023Test: 路径
data/ThesesFr20132023Test-* - LegiOpendataTest: 路径
data/LegiOpendataTest-* - BaloOpendataTest: 路径
data/BaloOpendataTest-* - JadeOpendataTest: 路径
data/JadeOpendataTest-* - DoleOpendataTest: 路径
data/DoleOpendataTest-* - SardeOpendataTest: 路径
data/SardeOpendataTest-* - QrOpendataTest: 路径
data/QrOpendataTest-* - JorfOpendataTest: 路径
data/JorfOpendataTest-* - IncaOpendataTest: 路径
data/IncaOpendataTest-* - AccoOpendataTest: 路径
data/AccoOpendataTest-* - KaliOpendataTest: 路径
data/KaliOpendataTest-* - DebatsOpendataTest: 路径
data/DebatsOpendataTest-* - CnilOpendataTest: 路径
data/CnilOpendataTest-* - CappOpendataTest: 路径
data/CappOpendataTest-* - CassOpendataTest: 路径
data/CassOpendataTest-* - ConstitOpendataTest: 路径
data/ConstitOpendataTest-* - IlluinLayoutDatasetTextOnlyTest: 路径
data/IlluinLayoutDatasetTextOnlyTest-* - WikisourceFrTest: 路径
data/WikisourceFrTest-* - Wikipedia20220301.frTest: 路径
data/Wikipedia20220301.frTest-* - Oscar2301FrTest: 路径
data/Oscar2301FrTest-*
- WmtEnFrTest: 路径
数据集信息
- 特征:
- id: 类型
string - text: 类型
string - dataset_id: 类型
string
- id: 类型
- 分割:
- WmtEnFrTest: 字节数
933080, 样本数3003 - EnglishFrenchWebpagesScrapedTranslatedTest: 字节数
3557903, 样本数8580 - FrenchLibrispeechTextOnlyTest: 字节数
698968, 样本数2582 - FrenchPodcastsTest: 字节数
505018, 样本数100 - FrenchOpenSubtitlesTest: 字节数
3048714, 样本数100 - OriginalSongsLyricsWithFrenchTranslationTest: 字节数
2156145, 样本数756 - ProjectgutenbergFrTest: 字节数
39019119, 样本数100 - BnfGallicaTest: 字节数
43160730, 样本数100 - ThesesFr20132023Test: 字节数
3957037, 样本数959 - LegiOpendataTest: 字节数
16589963, 样本数10000 - BaloOpendataTest: 字节数
11094568, 样本数1355 - JadeOpendataTest: 字节数
56977150, 样本数5586 - DoleOpendataTest: 字节数
2065780, 样本数100 - SardeOpendataTest: 字节数
1044391, 样本数2244 - QrOpendataTest: 字节数
18924359, 样本数100 - JorfOpendataTest: 字节数
11892298, 样本数10000 - IncaOpendataTest: 字节数
27827026, 样本数3737 - AccoOpendataTest: 字节数
36928857, 样本数2541 - KaliOpendataTest: 字节数
7740933, 样本数4306 - DebatsOpendataTest: 字节数
38200789, 样本数100 - CnilOpendataTest: 字节数
1495015, 样本数181 - CappOpendataTest: 字节数
9680857, 样本数727 - CassOpendataTest: 字节数
8283986, 样本数1422 - ConstitOpendataTest: 字节数
1340350, 样本数100 - IlluinLayoutDatasetTextOnlyTest: 字节数
11714355, 样本数4885 - WikisourceFrTest: 字节数
44358940, 样本数10000 - Wikipedia20220301.frTest: 字节数
28814742, 样本数10000 - Oscar2301FrTest: 字节数
51030875, 样本数9834
- WmtEnFrTest: 字节数
- 下载大小:
0 - 数据集大小:
483041948



