cagataydev/vlm-voice-commands

Name: cagataydev/vlm-voice-commands
Creator: cagataydev
Published: 2026-03-22 18:09:19
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/cagataydev/vlm-voice-commands

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-to-speech - robotics language: - en tags: - voice-commands - robotics - VLM - embodied-AI - manipulation - navigation - pick-and-place - human-robot-interaction size_categories: - 10K<n<100K --- # 🎙️ VLM Robotics Voice Commands (Text) **50,000 curated natural language commands for Vision-Language-Model robot control.** This dataset contains diverse text commands that humans would speak to a robot, covering 10 categories of embodied interaction. ## 🎯 Purpose - Training VLMs to understand spoken robot instructions - TTS source for generating audio training data - Benchmark for robot language understanding - Coverage of the full spectrum of human→robot speech ## 📊 Statistics | Metric | Value | |--------|-------| | **Total Commands** | 50,000 | | **Categories** | 10 | | **Avg Word Count** | 8.8 words | | **Language** | English | | **Unique** | 100% (deduplicated) | ### Category Distribution | Category | Count | % | Examples | |----------|------:|---:|----------| | pick_place | 18,570 | 37.1% | "Pick up the red cube", "Bring me the bottle" | | multistep | 8,464 | 16.9% | "Open the drawer, take the cup, close it" | | manipulation | 5,557 | 11.1% | "Pour water into the glass", "Fold the towel" | | navigation | 5,178 | 10.4% | "Go to the kitchen", "Turn left" | | observation | 5,008 | 10.0% | "What do you see?", "Count the objects" | | spatial | 3,506 | 7.0% | "Move arm left 5cm", "Lower the gripper" | | household | 1,937 | 3.9% | "Clean the table", "Set table for dinner" | | safety | 839 | 1.7% | "Stop!", "Be careful" | | conversational | 519 | 1.0% | "Good job", "Try again" | | context_rich | 422 | 0.8% | "Grab that", "The one I pointed at" | ## 🏗️ Schema | Column | Type | Description | |--------|------|-------------| | uid=503(cagatay) gid=20(staff) groups=20(staff),101(access_bpf),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),702(com.apple.sharepoint.group.2),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),400(com.apple.access_remote_ae),701(com.apple.sharepoint.group.1) | string | Unique ID (vlm_000000 format) | | | string | The voice command text | | | string | Command category | | | string | easy / medium / hard | | | string | Suggested TTS voice (NATM0-3, NATF0-3) | | | int | Number of words | ## 🗣️ Natural Variations Commands include natural speech patterns: - **Polite**: "Could you pick up the cup?" - **Casual**: "Hey, grab that" - **Urgent**: "Stop right now!" - **Questioning**: "Can you reach that?" - **Compound**: "Pick it up and bring it to me, okay?" ## 📝 Difficulty Levels | Level | Description | Example | |-------|-------------|---------| | **Easy** | Single simple action, safety | "Stop", "Good job" | | **Medium** | Standard pick/place/nav | "Pick up the red cube" | | **Hard** | Multi-step, context-dependent | "Find the cup, show me, put it away" | ## 🔧 Usage ## 📦 Related | Dataset | Description | |---------|-------------| | [cagataydev/vlm-voice-audio](https://huggingface.co/datasets/cagataydev/vlm-voice-audio) | Audio version with TTS | | [cagataydev/q-omni-data-soup](https://huggingface.co/datasets/cagataydev/q-omni-data-soup) | Multi-modal training soup | Built with [DevDuck](https://github.com/cagataycali/devduck) 🦆 --- *Generated on 2026-03-22*

提供机构：

cagataydev

5,000+

优质数据集

54 个

任务类型

进入经典数据集