bear7011/gemma-4-e4b-kinetics_40K
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bear7011/gemma-4-e4b-kinetics_40K
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- video-text-to-text
language:
- en
tags:
- video
- action-recognition
- video-description
- kinetics
- gemma
- fine-tuning
size_categories:
- 10K<n<100K
---
# Kinetics-40K Video Description Dataset
A 40,000-sample instruction-tuning dataset built from Kinetics-400/600/700 for training video description models (targeting Gemma 4 E4B).
## Dataset Summary
| Field | Value |
|---|---|
| Total samples | 40,000 |
| Source | Kinetics-400 (13,418) · Kinetics-600 (12,042) · Kinetics-700 (14,540) |
| Task | Video description (single natural sentence) |
| Language | English |
| Label style | Natural description (8–15 words) |
## Data Format
Each sample follows the multi-turn message format compatible with Gemma 4 / LLaVA-style fine-tuning:
```json
{
"messages": [
{
"role": "system",
"content": "You are a video description assistant. Watch the video and answer with one clear natural sentence in lowercase describing the main visible action."
},
{
"role": "user",
"content": [
{"type": "video", "video": "kinetic40K/{youtube_id}_{start:06d}_{end:06d}"},
{"type": "text", "text": "Describe the main action happening in this video in one sentence."}
]
},
{
"role": "assistant",
"content": [{"type": "text", "text": "a person is jumping over hurdles on a track."}]
}
],
"label": "hurdling",
"task_type": "video_description",
"label_style": "natural_description",
"source_format": "converted_from_annotation_csv"
}
```
## Video Paths
Video paths use the format `kinetic40K/{youtube_id}_{start:06d}_{end:06d}` (no file extension).
To reproduce locally, download clips with:
```bash
yt-dlp "https://www.youtube.com/watch?v={youtube_id}" \
--download-sections "*{start}-{end}" \
-o "kinetic40K/{youtube_id}_{start:06d}_{end:06d}.mp4"
```
Timestamps for all three sources are fully annotated:
- **K400**: timestamps recovered from the official Kinetics-400 annotation CSV
- **K600 / K700**: timestamps embedded in original filenames
## Label Generation
Class labels (e.g. `"hurdling"`) were expanded into natural English descriptions using **GPT-4o-mini**:
- Complete sentence, lowercase, 8–15 words
- Focus on the main visible action ("a person is …")
- 724 unique Kinetics classes processed
Post-processing applied:
- Consistent trailing period
- Unified subject (`a person is …`)
## Intended Use
Designed for instruction fine-tuning of **Gemma 4 E4B** (`google/gemma-4-e4b-it`) with video understanding capability. Compatible with GemmaFT and any LLaVA-style training framework that accepts the multi-turn message format.
提供机构:
bear7011



