five

bear7011/gemma-4-e4b-kinetics_40K

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bear7011/gemma-4-e4b-kinetics_40K
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - video-text-to-text language: - en tags: - video - action-recognition - video-description - kinetics - gemma - fine-tuning size_categories: - 10K<n<100K --- # Kinetics-40K Video Description Dataset A 40,000-sample instruction-tuning dataset built from Kinetics-400/600/700 for training video description models (targeting Gemma 4 E4B). ## Dataset Summary | Field | Value | |---|---| | Total samples | 40,000 | | Source | Kinetics-400 (13,418) · Kinetics-600 (12,042) · Kinetics-700 (14,540) | | Task | Video description (single natural sentence) | | Language | English | | Label style | Natural description (8–15 words) | ## Data Format Each sample follows the multi-turn message format compatible with Gemma 4 / LLaVA-style fine-tuning: ```json { "messages": [ { "role": "system", "content": "You are a video description assistant. Watch the video and answer with one clear natural sentence in lowercase describing the main visible action." }, { "role": "user", "content": [ {"type": "video", "video": "kinetic40K/{youtube_id}_{start:06d}_{end:06d}"}, {"type": "text", "text": "Describe the main action happening in this video in one sentence."} ] }, { "role": "assistant", "content": [{"type": "text", "text": "a person is jumping over hurdles on a track."}] } ], "label": "hurdling", "task_type": "video_description", "label_style": "natural_description", "source_format": "converted_from_annotation_csv" } ``` ## Video Paths Video paths use the format `kinetic40K/{youtube_id}_{start:06d}_{end:06d}` (no file extension). To reproduce locally, download clips with: ```bash yt-dlp "https://www.youtube.com/watch?v={youtube_id}" \ --download-sections "*{start}-{end}" \ -o "kinetic40K/{youtube_id}_{start:06d}_{end:06d}.mp4" ``` Timestamps for all three sources are fully annotated: - **K400**: timestamps recovered from the official Kinetics-400 annotation CSV - **K600 / K700**: timestamps embedded in original filenames ## Label Generation Class labels (e.g. `"hurdling"`) were expanded into natural English descriptions using **GPT-4o-mini**: - Complete sentence, lowercase, 8–15 words - Focus on the main visible action ("a person is …") - 724 unique Kinetics classes processed Post-processing applied: - Consistent trailing period - Unified subject (`a person is …`) ## Intended Use Designed for instruction fine-tuning of **Gemma 4 E4B** (`google/gemma-4-e4b-it`) with video understanding capability. Compatible with GemmaFT and any LLaVA-style training framework that accepts the multi-turn message format.
提供机构:
bear7011
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作