khursanirevo/dialogue-episodes
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/khursanirevo/dialogue-episodes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
- ms
- zh
- ru
- id
- ar
- ja
- ko
multilinguality:
- multilingual
task_categories:
- automatic-speech-recognition
- translation
---
# Multi-Language Dialogue Episodes Dataset
## Dataset Description
This dataset contains dialogue episodes with multi-language transcripts and separated speaker audio.
### Features
- **24,076 dialogue segments** from 138 videos
- **Multi-language transcripts** in 9 languages
- **Separated speakers** - 2-channel audio
- **Multi-channel audio** - stereo WAV files (24kHz)
### Usage
```python
from datasets import load_dataset
ds = load_dataset("khursanirevo/dialogue-episodes")
sample = ds["train"][0]
# Play audio
from IPython.display import Audio
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])
```
## License
CC-BY-NC-4.0
许可证:知识共享署名-非商业性使用4.0国际许可协议(CC-BY-NC-4.0)
语言:
- 英语(en)
- 马来语(ms)
- 中文(zh)
- 俄语(ru)
- 印尼语(id)
- 阿拉伯语(ar)
- 日语(ja)
- 韩语(ko)
多语言属性:多语言
任务类别:
- 自动语音识别
- 机器翻译
# 多语言对话剧集数据集(Multi-Language Dialogue Episodes Dataset)
## 数据集描述
本数据集包含带有多语言转录文本与分离说话人音频的对话剧集。
### 数据集特性
- **24076段对话片段**,源自138个视频
- **覆盖9种语言的多语言转录文本**
- **分离说话人音频**——双声道音频
- **多声道音频**——立体声WAV文件(24kHz采样率)
### 使用方法
python
from datasets import load_dataset
ds = load_dataset("khursanirevo/dialogue-episodes")
sample = ds["train"][0]
# 播放音频
from IPython.display import Audio
Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])
## 许可证
知识共享署名-非商业性使用4.0国际许可协议(CC-BY-NC-4.0)
提供机构:
khursanirevo



