five

atc-voxtral-with-confidence-filter

收藏
Hugging Face2026-03-20 更新2026-03-21 收录
下载链接:
https://huggingface.co/datasets/Trelis/atc-voxtral-with-confidence-filter
下载链接
链接失效反馈
官方服务:
资源简介:
atc-voxtral-with-confidence-filter 是一个语音数据集,由 Trelis Studio 准备。该数据集包含 13 个训练样本,总时长为 3.5 分钟。数据集的主要字段包括音频片段(16kHz)、纯文本转录、带 Whisper 时间戳标记的转录、前一段落的纯文本(用于条件预处理)、原始音频中的片段开始和结束时间、语音持续时间(不包括静音)、词级时间戳(以 JSON 格式)、原始音频文件名以及 Whisper 语言标记的 ISO 639-1 语言代码。数据集支持两种训练方法:默认的 2-bucket 方法和包含条件预处理的 4-bucket 方法。2-bucket 方法中,50% 使用纯文本转录,50% 使用带时间戳的转录。4-bucket 方法则进一步细分,包括纯文本、带条件预处理的纯文本、带时间戳的文本以及带条件预处理和时间戳的文本。该数据集适用于语音识别和转录任务,特别关注时间戳标记和对话连续性。

The atc-voxtral-with-confidence-filter is a speech dataset prepared by Trelis Studio. This dataset contains 13 training samples with a total duration of 3.5 minutes. The key fields of the dataset include 16kHz audio clips, raw text transcriptions, transcriptions with Whisper timestamps, raw text of the previous paragraph (for conditional preprocessing), start and end timestamps of the clips in the original audio, speech duration (excluding silent segments), word-level timestamps (in JSON format), original audio filenames, and ISO 639-1 language codes for Whisper language tags. The dataset supports two training methods: the default 2-bucket method and the 4-bucket method with conditional preprocessing. In the 2-bucket method, 50% of the samples use raw text transcriptions, while the remaining 50% use timestamped transcriptions. The 4-bucket method is further subdivided into four categories: raw text transcriptions, raw text transcriptions with conditional preprocessing, timestamped transcriptions, and timestamped transcriptions with conditional preprocessing. This dataset is intended for speech recognition and transcription tasks, with a particular focus on timestamp marking and conversational coherence.
提供机构:
Trelis
创建时间:
2026-03-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作