SilencioNetwork/global-french-speech
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SilencioNetwork/global-french-speech
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- automatic-speech-recognition
- audio-classification
- text-to-speech
language:
- fr
tags:
- french
- global-french
- francophone
- french-accents
- african-french
- canadian-french
- european-french
- multilingual
- speech-data
- asr
- tts
- crowdsourced
- real-world-audio
- native-speakers
pretty_name: "Global French Speech Dataset"
dataset_info:
features:
- name: file_name
dtype: string
- name: id
dtype: int64
- name: gender
dtype: string
- name: ethnicity
dtype: string
- name: occupation
dtype: string
- name: birth_place
dtype: string
- name: mother_tongue
dtype: string
- name: dialect
dtype: string
- name: year_of_birth
dtype: int64
- name: years_at_birth_place
dtype: int64
- name: languages_data
dtype: string
- name: os
dtype: string
- name: device
dtype: string
- name: browser
dtype: string
- name: duration
dtype: float64
- name: emotions
dtype: string
- name: language
dtype: string
- name: location
dtype: string
- name: noise_sources
dtype: string
- name: script_id
dtype: int64
- name: type_of_script
dtype: string
- name: script
dtype: string
- name: transcript
dtype: string
- name: speaker_id
dtype: string
configs:
- config_name: french_canada
data_files:
- split: free_speech
path: french_canada/free_speech/**
- config_name: french_global
data_files:
- split: free_speech
path: french_global/free_speech/**
size_categories:
- n<1K
---
# 🌍 Global French Speech Dataset
<div align="center">
[](https://creativecommons.org/licenses/by-nc/4.0/)
[](https://huggingface.co/datasets/SilencioNetwork/global-french-speech)
[](#geographic-coverage)
</div>
## 🎯 Overview
The **Global French Speech Dataset** provides high-quality, real-world speech recordings from native French speakers across **30+ Francophone countries and regions**. This dataset includes **50 audio files** from France and Canada, representing the diversity of French language across continents.
French is spoken by **300+ million people globally** across Europe, Africa, Americas, and Oceania. This dataset captures accent diversity from **2 major French variants** with comprehensive off-the-shelf inventory available for 30+ regions.
### Key Features
✅ **2 major French variants with samples** - Metropolitan French (France) and Canadian French
✅ **50 audio recordings** - Native French speakers
✅ **30+ Francophone regions available OTS** - Africa, Europe, Americas
✅ **Rich demographic metadata** - Gender, age, occupation, location, dialect
✅ **Real-world acoustic conditions** - Natural environments
✅ **156,000+ OTS recordings** - 1,600+ hours available commercially
### 🗂️ This is a Sample Dataset
**These 50 recordings represent a sample of Silencio's capabilities.** Full off-the-shelf inventory available:
| Country/Region | OTS Recordings | OTS Hours | In This Sample? |
|----------------|----------------|-----------|-----------------|
| **France** | 33,309 | 517 hours | ✅ 25 files |
| **Senegal** | 11,272 | 356 hours | ❌ Contact |
| **Benin** | 16,385 | 276 hours | ❌ Contact |
| **Switzerland** | 1,732 | 138 hours | ❌ Contact |
| **Burkina Faso** | 6,490 | 100 hours | ❌ Contact |
| **Tunisia** | 1,503 | 86 hours | ❌ Contact |
| **Algeria** | 3,761 | 80 hours | ❌ Contact |
| **Andorra** | 892 | 68 hours | ❌ Contact |
| **Madagascar** | 4,690 | 66 hours | ❌ Contact |
| **Cameroon** | 6,206 | 66 hours | ❌ Contact |
| **Togo** | 8,723 | 62 hours | ❌ Contact |
| **Morocco** | 4,358 | 59 hours | ❌ Contact |
| **Nigeria** | 26,648 | 34 hours | ❌ Contact |
| **Canada** | 1,715 | 16 hours | ✅ 25 files |
| **20+ more regions** | 30,000+ | 200+ hours | ❌ Contact |
| **TOTAL** | **156,000+** | **1,600+ hours** | **50 files (~32 min)** |
**Sample = 0.03% of available inventory** *(updated: March 30, 2026)*
### Geographic Coverage
**Europe:**
- France (Metropolitan French) - ✅ Samples available
- Switzerland, Belgium, Andorra, Monaco - OTS available
**Africa (Francophone):**
- West Africa: Senegal, Benin, Burkina Faso, Togo, Mali, Niger, Guinea, Côte d'Ivoire
- Central Africa: Cameroon, Congo, Gabon, Central African Republic
- North Africa: Algeria, Tunisia, Morocco
- East Africa: Madagascar, Comoros, Rwanda, Burundi
**Americas:**
- Canada (Quebec, New Brunswick) - ✅ Samples available
- Haiti, French Guiana, Caribbean territories
**Silencio's Complete OTS Catalog:**
- 📊 **156,000+ French recordings** from 30+ countries
- ⏱️ **1,600+ hours** of French speech data
- 🌍 **Every continent with French speakers represented**
- ✅ **Immediate commercial licensing** available
**Contact**: sofia@silencioai.com for full catalog and pricing.
## 📊 Dataset Statistics
| Metric | Value |
|--------|-------|
| **Total Audio Files** | 50 |
| **French Variants** | 2 |
| **Total Speakers** | 40+ unique speakers |
| **Audio Format** | WAV (16-bit PCM) |
| **Sample Rate** | 16 kHz / 44.1 kHz |
| **Total Duration** | ~32 minutes |
| **Geographic Coverage** | France, Canada |
### Variant Distribution
| Variant | Files | Region | Notes |
|---------|-------|--------|-------|
| Metropolitan French | 25 | France | Standard French, European accent |
| Canadian French | 25 | Canada | Quebec/Canadian accent, distinct from European |
| **Total** | **50** | **2 countries** | **Native speakers** |
## 🎯 Use Cases
- **Accent-Robust ASR**: Train French speech recognition across regional variants
- **Dialect Identification**: Distinguish European vs Canadian vs African French
- **TTS Development**: Multi-accent French text-to-speech
- **Linguistic Research**: Study phonetic variation in Francophone world
- **Model Evaluation**: Test fairness across French-speaking demographics
## 📁 Dataset Structure
```
global-french-speech/
├── french_global/ # Metropolitan French (France)
│ └── free_speech/
│ ├── data/
│ │ ├── audio_*.wav
│ └── metadata.csv
└── french_canada/ # Canadian French (Quebec)
└── free_speech/
├── data/
│ ├── audio_*.wav
└── metadata.csv
```
## 🚀 Getting Started
```python
from datasets import load_dataset
# Load entire dataset
dataset = load_dataset("SilencioNetwork/global-french-speech")
# Load specific variant
french_france = load_dataset(
"SilencioNetwork/global-french-speech",
name="french_global"
)
canadian_french = load_dataset(
"SilencioNetwork/global-french-speech",
name="french_canada"
)
```
## ⚖️ License & Usage
**License**: [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
✅ Research, academic, educational, non-commercial use
❌ Commercial products/services (requires licensing)
**Commercial licensing**: sofia@silencioai.com
## 🏢 About Silencio
**Silencio** provides scaled sourcing of real-world speech data. With **2M+ contributors across 180+ countries**, we specialize in global language coverage including comprehensive Francophone dialect diversity.
**Learn more**: [silencioai.com](https://silencioai.com)
## 📚 Citation
```bibtex
@dataset{silencio_global_french_2026,
title={Global French Speech Dataset},
author={Silencio Network},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/datasets/SilencioNetwork/global-french-speech}
}
```
## 🤝 Contact
**Email**: sofia@silencioai.com
**HuggingFace**: [Discussion Forum](https://huggingface.co/datasets/SilencioNetwork/global-french-speech/discussions)
---
**Built by [Silencio](https://silencioai.com) | Licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)**
**Tags**: #French #Francophone #GlobalFrench #CanadianFrench #AfricanFrench #ASR #TTS #VoiceAI
提供机构:
SilencioNetwork



