Itbanque/ScreenTalk-XS
收藏Hugging Face2025-04-01 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/Itbanque/ScreenTalk-XS
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
language:
- en
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: audio
dtype: audio
- name: duration
dtype: float64
- name: sentence
dtype: string
- name: uid
dtype: string
- name: group_id
dtype: string
splits:
- name: train
num_bytes: 2266558522.0
num_examples: 8000
- name: valid
num_bytes: 260170178.0
num_examples: 1000
- name: test
num_bytes: 283817142.0
num_examples: 1000
download_size: 2784217234
dataset_size: 2810545842.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: valid
path: data/valid-*
- split: test
path: data/test-*
---
# 🎬 ScreenTalk-XS: Sample Speech Dataset from Screen Content 🖥️



### 📢 **What is ScreenTalk-XS?**
**ScreenTalk-XS** is a **high-quality transcribed speech dataset** containing **10k speech samples** from diverse screen content.
It is designed for **automatic speech recognition (ASR), natural language processing (NLP), and conversational AI research**.
✅ **This dataset is freely available for research and educational use.**
🔹 If you need a **larger dataset with more diverse speech samples**, check out:
👉 [ScreenTalk (Full Dataset)](https://huggingface.co/datasets/DataLabX/ScreenTalk)
---
## 📜 **Dataset Details**
| Feature | Description |
|---------|-------------|
| **Total Clips** | 10k transcribed speech samples |
| **Languages** | English |
| **Format** | `.wav` (audio) + `.csv` (transcriptions) |
| **Use Case** | ASR, Speech-to-Text, NLP, Conversational AI |
🆓 **This dataset is free to use for research purposes.**
* 📊 train set duration: 18.55 hours
* 📊 valid set duration: 2.12 hours
* 📊 test swet duration: 2.32 hours
---
## 📂 **Dataset Structure**
ScreenTalk-XS contains the following fields:
| Column | Description |
|--------|-------------|
| `audio` | Audio info |
| `duration` | the duration of the audio chunk |
| `sentence` | Transcribed speech |
📌 **Example Entry**
```json
{
"audio": "path/to/audio.wav",
"sentence": "I will find you and I will train my model."
}
提供机构:
Itbanque



