syvai/danish-asr-unified
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/syvai/danish-asr-unified
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: audio
dtype: audio
- name: source
dtype: string
splits:
- name: train
configs:
- config_name: default
data_files:
- split: train
path: data/train-*.parquet
license: cc-by-4.0
language:
- da
task_categories:
- automatic-speech-recognition
pretty_name: Danish ASR Unified Dataset
---
# Danish ASR Unified Dataset
Unified Danish speech recognition dataset combining 7 sources (~3.5M samples, ~16k hours):
| Source | Samples | Description |
|---|---|---|
| VoxPopuli | 1,775,578 | European Parliament recordings |
| ftspeech | 995,677 | Danish Parliament (Folketinget) |
| CoRal-v3 read_aloud | 299,255 | Read-aloud Danish speech |
| nst-da | 182,605 | NST Danish speech |
| CoRal-v3 conversation | 147,249 | Conversational Danish speech |
| nota | 98,600 | Danish broadcast media |
| Common Voice 17 | 3,484 | Crowd-sourced Danish speech |
All audio is 16kHz mono OGG Vorbis.
数据集信息:
特征字段:
- 字段名:text
数据类型:字符串
- 字段名:audio
数据类型:音频
- 字段名:source
数据类型:字符串
划分集:
- 名称:train
配置项:
- 配置名称:default
数据文件:
- 划分集:train
路径:data/train-*.parquet
许可证:CC BY 4.0
语言:
- 丹麦语(da)
任务类别:
- 自动语音识别(automatic-speech-recognition)
数据集名称:丹麦语自动语音识别统一数据集
# 丹麦语自动语音识别统一数据集
本数据集为整合7个来源的丹麦语语音识别数据集,包含约350万条样本、总计约1.6万小时音频数据:
| 来源 | 样本数量 | 描述 |
|---|---|---|
| VoxPopuli | 1,775,578 | 欧洲议会录音 |
| ftspeech | 995,677 | 丹麦议会(Folketinget)录音 |
| CoRal-v3 read_aloud | 299,255 | 朗读式丹麦语语音数据 |
| nst-da | 182,605 | NST丹麦语语音数据 |
| CoRal-v3 conversation | 147,249 | 会话式丹麦语语音数据 |
| nota | 98,600 | 丹麦广播媒体语音数据 |
| Common Voice 17 | 3,484 | 众包式丹麦语语音数据 |
所有音频均为16kHz单声道OGG Vorbis格式。
提供机构:
syvai



