five

SilencioNetwork/global-french-speech

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SilencioNetwork/global-french-speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - automatic-speech-recognition - audio-classification - text-to-speech language: - fr tags: - french - global-french - francophone - french-accents - african-french - canadian-french - european-french - multilingual - speech-data - asr - tts - crowdsourced - real-world-audio - native-speakers pretty_name: "Global French Speech Dataset" dataset_info: features: - name: file_name dtype: string - name: id dtype: int64 - name: gender dtype: string - name: ethnicity dtype: string - name: occupation dtype: string - name: birth_place dtype: string - name: mother_tongue dtype: string - name: dialect dtype: string - name: year_of_birth dtype: int64 - name: years_at_birth_place dtype: int64 - name: languages_data dtype: string - name: os dtype: string - name: device dtype: string - name: browser dtype: string - name: duration dtype: float64 - name: emotions dtype: string - name: language dtype: string - name: location dtype: string - name: noise_sources dtype: string - name: script_id dtype: int64 - name: type_of_script dtype: string - name: script dtype: string - name: transcript dtype: string - name: speaker_id dtype: string configs: - config_name: french_canada data_files: - split: free_speech path: french_canada/free_speech/** - config_name: french_global data_files: - split: free_speech path: french_global/free_speech/** size_categories: - n<1K --- # 🌍 Global French Speech Dataset <div align="center"> [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) [![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/datasets/SilencioNetwork/global-french-speech) [![Countries](https://img.shields.io/badge/Countries-30+-blue)](#geographic-coverage) </div> ## 🎯 Overview The **Global French Speech Dataset** provides high-quality, real-world speech recordings from native French speakers across **30+ Francophone countries and regions**. This dataset includes **50 audio files** from France and Canada, representing the diversity of French language across continents. French is spoken by **300+ million people globally** across Europe, Africa, Americas, and Oceania. This dataset captures accent diversity from **2 major French variants** with comprehensive off-the-shelf inventory available for 30+ regions. ### Key Features ✅ **2 major French variants with samples** - Metropolitan French (France) and Canadian French ✅ **50 audio recordings** - Native French speakers ✅ **30+ Francophone regions available OTS** - Africa, Europe, Americas ✅ **Rich demographic metadata** - Gender, age, occupation, location, dialect ✅ **Real-world acoustic conditions** - Natural environments ✅ **156,000+ OTS recordings** - 1,600+ hours available commercially ### 🗂️ This is a Sample Dataset **These 50 recordings represent a sample of Silencio's capabilities.** Full off-the-shelf inventory available: | Country/Region | OTS Recordings | OTS Hours | In This Sample? | |----------------|----------------|-----------|-----------------| | **France** | 33,309 | 517 hours | ✅ 25 files | | **Senegal** | 11,272 | 356 hours | ❌ Contact | | **Benin** | 16,385 | 276 hours | ❌ Contact | | **Switzerland** | 1,732 | 138 hours | ❌ Contact | | **Burkina Faso** | 6,490 | 100 hours | ❌ Contact | | **Tunisia** | 1,503 | 86 hours | ❌ Contact | | **Algeria** | 3,761 | 80 hours | ❌ Contact | | **Andorra** | 892 | 68 hours | ❌ Contact | | **Madagascar** | 4,690 | 66 hours | ❌ Contact | | **Cameroon** | 6,206 | 66 hours | ❌ Contact | | **Togo** | 8,723 | 62 hours | ❌ Contact | | **Morocco** | 4,358 | 59 hours | ❌ Contact | | **Nigeria** | 26,648 | 34 hours | ❌ Contact | | **Canada** | 1,715 | 16 hours | ✅ 25 files | | **20+ more regions** | 30,000+ | 200+ hours | ❌ Contact | | **TOTAL** | **156,000+** | **1,600+ hours** | **50 files (~32 min)** | **Sample = 0.03% of available inventory** *(updated: March 30, 2026)* ### Geographic Coverage **Europe:** - France (Metropolitan French) - ✅ Samples available - Switzerland, Belgium, Andorra, Monaco - OTS available **Africa (Francophone):** - West Africa: Senegal, Benin, Burkina Faso, Togo, Mali, Niger, Guinea, Côte d'Ivoire - Central Africa: Cameroon, Congo, Gabon, Central African Republic - North Africa: Algeria, Tunisia, Morocco - East Africa: Madagascar, Comoros, Rwanda, Burundi **Americas:** - Canada (Quebec, New Brunswick) - ✅ Samples available - Haiti, French Guiana, Caribbean territories **Silencio's Complete OTS Catalog:** - 📊 **156,000+ French recordings** from 30+ countries - ⏱️ **1,600+ hours** of French speech data - 🌍 **Every continent with French speakers represented** - ✅ **Immediate commercial licensing** available **Contact**: sofia@silencioai.com for full catalog and pricing. ## 📊 Dataset Statistics | Metric | Value | |--------|-------| | **Total Audio Files** | 50 | | **French Variants** | 2 | | **Total Speakers** | 40+ unique speakers | | **Audio Format** | WAV (16-bit PCM) | | **Sample Rate** | 16 kHz / 44.1 kHz | | **Total Duration** | ~32 minutes | | **Geographic Coverage** | France, Canada | ### Variant Distribution | Variant | Files | Region | Notes | |---------|-------|--------|-------| | Metropolitan French | 25 | France | Standard French, European accent | | Canadian French | 25 | Canada | Quebec/Canadian accent, distinct from European | | **Total** | **50** | **2 countries** | **Native speakers** | ## 🎯 Use Cases - **Accent-Robust ASR**: Train French speech recognition across regional variants - **Dialect Identification**: Distinguish European vs Canadian vs African French - **TTS Development**: Multi-accent French text-to-speech - **Linguistic Research**: Study phonetic variation in Francophone world - **Model Evaluation**: Test fairness across French-speaking demographics ## 📁 Dataset Structure ``` global-french-speech/ ├── french_global/ # Metropolitan French (France) │ └── free_speech/ │ ├── data/ │ │ ├── audio_*.wav │ └── metadata.csv └── french_canada/ # Canadian French (Quebec) └── free_speech/ ├── data/ │ ├── audio_*.wav └── metadata.csv ``` ## 🚀 Getting Started ```python from datasets import load_dataset # Load entire dataset dataset = load_dataset("SilencioNetwork/global-french-speech") # Load specific variant french_france = load_dataset( "SilencioNetwork/global-french-speech", name="french_global" ) canadian_french = load_dataset( "SilencioNetwork/global-french-speech", name="french_canada" ) ``` ## ⚖️ License & Usage **License**: [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) ✅ Research, academic, educational, non-commercial use ❌ Commercial products/services (requires licensing) **Commercial licensing**: sofia@silencioai.com ## 🏢 About Silencio **Silencio** provides scaled sourcing of real-world speech data. With **2M+ contributors across 180+ countries**, we specialize in global language coverage including comprehensive Francophone dialect diversity. **Learn more**: [silencioai.com](https://silencioai.com) ## 📚 Citation ```bibtex @dataset{silencio_global_french_2026, title={Global French Speech Dataset}, author={Silencio Network}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/datasets/SilencioNetwork/global-french-speech} } ``` ## 🤝 Contact **Email**: sofia@silencioai.com **HuggingFace**: [Discussion Forum](https://huggingface.co/datasets/SilencioNetwork/global-french-speech/discussions) --- **Built by [Silencio](https://silencioai.com) | Licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)** **Tags**: #French #Francophone #GlobalFrench #CanadianFrench #AfricanFrench #ASR #TTS #VoiceAI
提供机构:
SilencioNetwork
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作