scotus-sim/scotus-clarence_thomas-audio

Name: scotus-sim/scotus-clarence_thomas-audio
Creator: scotus-sim
Published: 2026-04-19 00:41:21
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/scotus-sim/scotus-clarence_thomas-audio

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 tags: - audio - oyez - supreme-court - speech size_categories: - 1K<n<10K --- # SCOTUS-sim audio: clarence_thomas Per-utterance audio clips from Oyez oral-argument mp3s, sliced at the `start_time` / `stop_time` timestamps stored in the companion `scotus-sim/scotus-clarence_thomas-training` dataset. ## Alignment `clip_NNNNN.wav` in the tarball corresponds **exactly** to `audio_segments.jsonl[NNNNN]` in the training companion dataset. In `metadata.jsonl` each row carries the same 0-padded index in `idx`. This supersedes the v1 tarball, which had systematic audio↔segment index misalignment (Apr 2026). ## Stats - clips: **2558** - total duration: **5.58 hours** - unique cases: **376** ## Files - `clarence_thomas.tar.gz` — all clips; members named `clarence_thomas_NNNNN.wav`, 24 kHz mono PCM 16-bit. - `metadata.jsonl` — one row per clip with `text`, `speaker`, `duration`, `case_id`, `audio_url`, `idx`. ## License Audio is derived from Oyez.org (CC-BY-NC 4.0). Derivative TTS training artifacts inherit the NC term.

提供机构：

scotus-sim

5,000+

优质数据集

54 个

任务类型

进入经典数据集