wovenbytoyota-vai/InstVL
收藏Hugging Face2025-10-15 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/wovenbytoyota-vai/InstVL
下载链接
链接失效反馈官方服务:
资源简介:
InstVL是一个大规模的实例感知时空视觉语言数据集,旨在弥合整体场景理解和细粒度实例级理解之间的差距。它提供了两个层次的详细文本注释:全局描述和实例描述。数据集包含超过340万个实例,分布在超过200万张图片和5万段视频中,为以实例为中心的预训练和基准测试提供了丰富的监督。
InstVL is a large-scale, instance-aware spatio-temporal vision-language dataset designed to bridge the gap between holistic scene understanding and fine-grained, instance-level comprehension. It provides two levels of detailed textual annotations: global captions and instance captions. The dataset contains over 3.4 million instances in over 2 million images and 50,000 videos, offering rich supervision for instance-centric pre-training and benchmarking.
提供机构:
wovenbytoyota-vai



