ali-vosoughi/oscar-dataset
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ali-vosoughi/oscar-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: OSCaR
language:
- en
license: other
task_categories:
- image-to-text
task_ids:
- image-captioning
size_categories:
- 10K<n<100K
---
# OSCaR
OSCaR is the public dataset release for the NAACL 2024 paper
_Object State Captioning and State Change Representation_.
This release packages the preserved OSCaR image assets, fine-tuning manifests,
benchmark split metadata, and state-caption sidecars used around the LLaVA-based
training and evaluation workflow published in the
[OSCaR GitHub repository](https://github.com/nguyennm1024/OSCaR).
## Release Summary
- Paper-reported scale: **14,084** annotated segments across EPIC-KITCHENS and Ego4D.
- Public raw asset tree in this release: **7,742** clip directories under `data/object-state-data`.
- Full preserved image-caption mapping: **30,308** rows across **7,577** clips.
- LLaVA fine-tuning manifest: **28,308** image-level conversations across **7,077** clips.
- Human-verified EPIC benchmark split: **2,000** rows / **500** clips / 4 caption slots.
- Sidecar annotations included: **7,586** state-change JSON files, **2,244** QA JSON files, **3,142** conversation JSON files.
- Open-world evaluation metadata included: **356** Ego4D records and **344** EPIC-KITCHENS records.
## What Is Included
- `data/object-state-data/`: preserved OSCaR frame directories and `state_change.jpg` composites.
- `manifests/llava_data.json`: OSCaR fine-tuning manifest used for adapter training.
- `splits/data_mapping_final_EK_test.csv`: held-out human-verified EPIC benchmark split.
- `metadata/data_mapping_final.csv`: full preserved image-to-caption mapping.
- `metadata/video-object.csv`: narration-to-object/action table.
- `metadata/ego4d_data.csv`: preserved Ego4D action/object metadata.
- `annotations/state-change-json/`: state caption JSON sidecars.
- `annotations/question-answers-clean/`: optional QA sidecars.
- `annotations/conversation-clean/`: optional conversation sidecars.
- `eval/openworld.json` and `eval/openworld-epic.json`: open-world evaluation prompts/metadata.
## Directory Layout
```text
oscar-dataset/
data/object-state-data/
manifests/llava_data.json
splits/data_mapping_final_EK_test.csv
metadata/data_mapping_final.csv
metadata/segment_index.csv
metadata/release_summary.json
annotations/state-change-json/
annotations/question-answers-clean/
annotations/conversation-clean/
eval/openworld.json
eval/openworld-epic.json
```
## Important Notes
- The paper reports 14,084 annotated segments, but the preserved public asset tree
in this release contains 7,742 clip directories. The
released metadata keeps both the paper-scale claim and the preserved local
archive counts explicit.
- `metadata/segment_index.csv` is the normalized release table generated from the
preserved asset tree, the full mapping CSV, the fine-tuning manifest, and the
benchmark split.
- Some open-world evaluation JSON records still reference original local EPIC or
Ego4D frame roots. Those records are included for provenance and regeneration,
not as a promise that every referenced raw frame path is redistributed here.
## Usage With OSCaR Code
The public code release expects a workspace like:
```text
workspace/
OSCaR/
oscar-dataset/
```
Then run, for example:
```bash
DATASET_ROOT=../oscar-dataset \
bash scripts/train/finetune_v1_5_13b_oscar_lora.sh
```
## Provenance
- Source corpora: EPIC-KITCHENS and Ego4D, as described in the paper.
- Public code: `nguyennm1024/OSCaR`
- Public model namespace: `ali-vosoughi`
- Dataset repo: `ali-vosoughi/oscar-dataset`
## Citation
```bibtex
@inproceedings{nguyen2024oscar,
title={OSCaR: Object State Captioning and State Change Representation},
author={Nguyen, Nguyen and Bi, Jing and Vosoughi, Ali and Tian, Yapeng and Fazli, Pooyan and Xu, Chenliang},
booktitle={North American Chapter of the Association for Computational Linguistics (NAACL)},
year={2024}
}
```
提供机构:
ali-vosoughi



