five

Itbanque/ScreenTalk-XS

收藏
Hugging Face2025-04-01 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/Itbanque/ScreenTalk-XS
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - automatic-speech-recognition language: - en size_categories: - 1K<n<10K dataset_info: features: - name: audio dtype: audio - name: duration dtype: float64 - name: sentence dtype: string - name: uid dtype: string - name: group_id dtype: string splits: - name: train num_bytes: 2266558522.0 num_examples: 8000 - name: valid num_bytes: 260170178.0 num_examples: 1000 - name: test num_bytes: 283817142.0 num_examples: 1000 download_size: 2784217234 dataset_size: 2810545842.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: valid path: data/valid-* - split: test path: data/test-* --- # 🎬 ScreenTalk-XS: Sample Speech Dataset from Screen Content 🖥️ ![Hugging Face](https://img.shields.io/badge/HuggingFace-Dataset-blue) ![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-orange) ![Dataset Type: XS (Limited Version)](https://img.shields.io/badge/Access-Limited-green) ### 📢 **What is ScreenTalk-XS?** **ScreenTalk-XS** is a **high-quality transcribed speech dataset** containing **10k speech samples** from diverse screen content. It is designed for **automatic speech recognition (ASR), natural language processing (NLP), and conversational AI research**. ✅ **This dataset is freely available for research and educational use.** 🔹 If you need a **larger dataset with more diverse speech samples**, check out: 👉 [ScreenTalk (Full Dataset)](https://huggingface.co/datasets/DataLabX/ScreenTalk) --- ## 📜 **Dataset Details** | Feature | Description | |---------|-------------| | **Total Clips** | 10k transcribed speech samples | | **Languages** | English | | **Format** | `.wav` (audio) + `.csv` (transcriptions) | | **Use Case** | ASR, Speech-to-Text, NLP, Conversational AI | 🆓 **This dataset is free to use for research purposes.** * 📊 train set duration: 18.55 hours * 📊 valid set duration: 2.12 hours * 📊 test swet duration: 2.32 hours --- ## 📂 **Dataset Structure** ScreenTalk-XS contains the following fields: | Column | Description | |--------|-------------| | `audio` | Audio info | | `duration` | the duration of the audio chunk | | `sentence` | Transcribed speech | 📌 **Example Entry** ```json { "audio": "path/to/audio.wav", "sentence": "I will find you and I will train my model." }
提供机构:
Itbanque
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作