i4ds/spc_r_segmented

Name: i4ds/spc_r_segmented
Creator: i4ds
Published: 2026-02-25 15:35:45
License: 暂无描述

Hugging Face2026-02-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/i4ds/spc_r_segmented

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: duration dtype: float64 - name: audio dtype: audio - name: text dtype: string language: - de task_categories: - automatic-speech-recognition source_datasets: - i4ds/spc_r license: cc-by-4.0 --- # i4ds/spc_r_segmented Diarized and segmented speech dataset derived from [i4ds/spc_r](https://huggingface.co/datasets/i4ds/spc_r). ## Description Each row is a merged speech segment belonging to a single speaker. The source audio and SRT subtitles from `i4ds/spc_r` were processed with the following pipeline: 1. **Diarization** -- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) assigned speaker labels to each SRT segment based on temporal overlap. 2. **Merging** -- Consecutive SRT segments from the same speaker were merged when the silence gap between them was below a threshold (default 1.0s) and the resulting duration stayed within bounds (default 10--20s). 3. **Slicing** -- The merged time ranges were used to slice the original audio waveform. Each segment is encoded as FLAC. ## Columns | Column | Type | Description | |------------|---------|--------------------------------------------------| | `id` | string | Unique identifier (`row{NNNNN}_seg{NNN}`) | | `duration` | float64 | Segment duration in seconds | | `audio` | audio | FLAC audio for the segment | | `text` | string | Merged transcript text from the SRT segments | ## Usage ```python from datasets import load_dataset ds = load_dataset("i4ds/spc_r_segmented") print(ds["train"][0]) ```

提供机构：

i4ds

5,000+

优质数据集

54 个

任务类型

进入经典数据集