Tracking-Any-Granularity
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/MCG-NJU/Tracking-Any-Granularity
下载链接
链接失效反馈官方服务:
资源简介:
# SAM 2++: Tracking Anything at Any Granularity
🔥 [Evaluation Server](TODO) | 🏠 [Homepage](https://tracking-any-granularity.github.io/) | 📄 [Paper](https://arxiv.org/abs/2510.18822) | 🔗 [GitHub](https://github.com/MCG-NJU/SAM2-Plus)
## Download
We recommend using `huggingface-cli` to download:
```
pip install -U "huggingface_hub[cli]"
huggingface-cli download MCG-NJU/Tracking-Any-Granularity --repo-type dataset --local-dir ./Tracking-Any-Granularity --local-dir-use-symlinks False --max-workers 16
```
## 🔥 Latest News
- **[2025-11-25]** Challenge Leaderboard for Tracking-Any-Granularity dataset is now online on [CodaBench](https://www.codabench.org/competitions/11796/). This challenge aims to benchmark the performance of unified trackers across different granularities on test set of our Tracking-Any-Granularity dataset. We welcome researchers to participate and contribute to advancing the field of unified tracking. Meanwhile, we also provide to evaluate task-independent trackers on the leaderboard.
- **[2024-10-27]** To provide a benchmark for the task of language reference, such as 'Tracking by natural language specification' and 'Referring video object segmentation', we added the language description of the object in meta json.
- **[2024-10-24]** [SAM 2++ model](https://github.com/MCG-NJU/SAM2-Plus) and part of [Tracking-Any-Granularity dataset](https://huggingface.co/datasets/MCG-NJU/tracking-any-granularity) are released. Check out the [project page](https://tracking-any-granularity.github.io/) for more details.
## Dataset Summary
**T**racking-**A**ny-**G**ranularity (TAG) is a comprehensive dataset for training our unified model, termed Tracking-Any-Granularity (TAG), with annotations across three granularities: segmentation masks, bounding boxes, and key points.
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00025.gif"/></td>
<td><img width="220" src="assets/data/00076.gif"/></td>
<td><img width="220" src="assets/data/00045.gif"/></td>
</tr>
</tbody>
</table>
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00102.gif"/></td>
<td><img width="220" src="assets/data/00103.gif"/></td>
<td><img width="220" src="assets/data/00152.gif"/></td>
</tr>
</tbody>
</table>
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00227.gif"/></td>
<td><img width="220" src="assets/data/00117.gif"/></td>
<td><img width="220" src="assets/data/00312.gif"/></td>
</tr>
</tbody>
</table>
## Dataset Description
Our dataset includes **a wide range of video sources**, demonstrating strong diversity and serving as a solid benchmark for evaluating tracking performance. Each video sequence is annotated with **18 attributes representing different tracking challenges**, which can appear simultaneously in the same video. Common challenges include motion blur, deformation, and partial occlusion, reflecting the dataset’s high difficulty. Most videos contain multiple attributes, indicating the dataset’s coverage of complex and diverse tracking scenarios.

## Benchmark Results
We evaluated many representative trackers on the valid and test splits of our dataset:
*video object segmentation*
| Model | 𝒥 & ℱ | 𝒥 | ℱ | 𝒥 & ℱ | 𝒥 | ℱ |
|-------------------------------|---------|---------|---------|---------|---------|---------|
| STCN | 70.4 | 65.9 | 75 | 76.2 | 72.2 | 80.2 |
| AOT-SwinB | 78.1 | 73.1 | 83.2 | 80.9 | 76.4 | 85.4 |
| DeAOT-SwinB | 79.6 | 74.8 | 84.4 | 81.6 | 77.3 | 85.9 |
| XMem | 74.4 | 70.1 | 78.6 | 75.7 | 71.8 | 79.6 |
| DEVA | 77.9 | 73.1 | 82.6 | 82.1 | 78.0 | 86.1 |
| Cutie-base+ | 79.0 | 75.0 | 83.0 | 83.8 | 80.0 | 87.7 |
| Cutie-base+ w/MEGA | 80.3 | 76.5 | 84.2 | 84.9 | 81.3 | 88.5 |
| OneVOS | 80.1 | 75.2 | 85.1 | 81 | 76.5 | 85.4 |
| OneVOS w/MOSE | 79.3 | 74.3 | 84.3 | 82.4 | 78 | 86.7 |
| JointFormer | 76.6 | 72.8 | 80.5 | 79.1 | 75.5 | 82.7 |
| SAM2++ | 87.4 | 84.2 | 90.7 | 87.9 | 84.9 | 90.9 |
*single object tracking*
| Model | AUC | P_Norm | P | AUC | P_Norm | P |
|------------------------------|---------|---------|---------|---------|---------|---------|
| OSTrack | 74.8 | 84.4 | 72.7 | 69.7 | 78.8 | 69.9 |
| SimTrack | 71.1 | 80.5 | 68.1 | 64.1 | 72.4 | 60.5 |
| MixViT w/ConvMAE | 72.1 | 80.9 | 70.5 | 69.7 | 78.2 | 70.2 |
| DropTrack | 76.8 | 86.9 | 74.4 | 71.1 | 80.5 | 72.1 |
| GRM | 73.1 | 82.3 | 71.4 | 69.1 | 77.4 | 69.1 |
| SeqTrack | 77.0 | 85.8 | 76.1 | 69.8 | 79.4 | 71.5 |
| ARTrack | 76.8 | 85.8 | 75.7 | 71.1 | 78.7 | 70.9 |
| ARTrack-V2 | 76.3 | 85.5 | 74.3 | 71.8 | 79.5 | 71.9 |
| ROMTrack | 75.6 | 85.4 | 73.7 | 71.3 | 80.8 | 72.8 |
| HIPTrack | 78.2 | 88.5 | 76.6 | 71.4 | 81 | 72.5 |
| LoRAT | 75.1 | 84.8 | 74.4 | 70.5 | 79.7 | 68.7 |
| SAM2++ | 80.7 | 89.7 | 77.8 | 78 | 85.7 | 81.5 |
*point tracking*
| Model | Acc | Acc |
|------------|---------|---------|
| pips | 19.0 | 19.8 |
| pips++ | 20.9 | 23.1 |
| CoTracker | 23.3 | 22.3 |
| CoTracker3 | 29.6 | 29.1 |
| TAPTR | 23.7 | 23.8 |
| TAPIR | 21.3 | 24.6 |
| LocoTrack | 25.2 | 30.2 |
| Track-On | 24.8 | 25.8 |
| SAM2++ | 35.3 | 37.7 |
## Dataset Structure
```
<ImageSets>
│
├── valid.txt
├── test.txt
<valid/test.tar.gz>
│
├── Annotations
│ │
│ ├── <video_name_1>
│ │ ├── 00000.png
│ │ ├── 00001.png
│ │ └── ...
│ │
│ ├── <video_name_2>
│ │ ├── 00000.png
│ │ ├── 00001.png
│ │ └── ...
│ │
│ ├── <video_name_...>
│
├── Points
│ │
│ ├── <video_name_1>.npz
│ ├── <video_name_2>.npz
│ ├── <video_name_...>.npz
│
├── Boxes
│ │
│ ├── <video_name_1>.txt
│ ├── <video_name_2>.txt
│ ├── <video_name_...>.txt
│
├── Visible
│ │
│ ├── <video_name_1>.txt
│ ├── <video_name_2>.txt
│ ├── <video_name_...>.txt
│
└── JPEGImages
│
├── <video_name_1>
│ ├── 00000.jpg
│ ├── 00001.jpg
│ └── ...
│
├── <video_name_2>
│ ├── 00000.jpg
│ ├── 00001.jpg
│ └── ...
│
└── <video_name_...>
```
## BibTeX
If you find Tracking-Any-Granularity helpful to your research, please consider citing our papers.
```
@article{zhang2025sam2trackinggranularity,
title={SAM 2++: Tracking Anything at Any Granularity},
author={Jiaming Zhang and Cheng Liang and Yichun Yang and Chenkai Zeng and Yutao Cui and Xinwen Zhang and Xin Zhou and Kai Ma and Gangshan Wu and Limin Wang},
journal={arXiv preprint arXiv:2510.18822},
url={https://arxiv.org/abs/2510.18822},
year={2025}
}
```
## License
Tracking-Any-Granularity dataset is licensed under a [Creative Commons license (CC-BY) 4.0 License](https://creativecommons.org/licenses). The data of Tracking-Any-Granularity is released for non-commercial research purpose only.
# SAM 2++: 任意粒度目标追踪
🔥 [评估服务器(Evaluation Server)](TODO) | 🏠 [项目主页](https://tracking-any-granularity.github.io/) | 📄 [论文](https://arxiv.org/abs/2510.18822) | 🔗 [GitHub仓库](https://github.com/MCG-NJU/SAM2-Plus)
## 下载
我们推荐使用`huggingface-cli`工具进行数据集下载:
pip install -U "huggingface_hub[cli]"
huggingface-cli download MCG-NJU/Tracking-Any-Granularity --repo-type dataset --local-dir ./Tracking-Any-Granularity --local-dir-use-symlinks False --max-workers 16
## 🔥 最新动态
- **[2025-11-25]** Tracking-Any-Granularity(TAG)数据集的竞赛排行榜现已在[CodaBench](https://www.codabench.org/competitions/11796/)上线。本次竞赛旨在基于本数据集的测试集,对不同粒度下的统一追踪器性能进行基准测试。我们诚挚欢迎科研人员参与,共同推动统一追踪领域的发展。同时,排行榜也支持对任务无关追踪器进行评估。
- **[2024-10-27]** 为支持“自然语言指定目标追踪”与“指代视频目标分割”等语言参考任务的基准测试,我们在元数据JSON文件中新增了目标的语言描述信息。
- **[2024-10-24]** [SAM 2++模型](https://github.com/MCG-NJU/SAM2-Plus)与部分[Tracking-Any-Granularity数据集](https://huggingface.co/datasets/MCG-NJU/tracking-any-granularity)正式发布。更多详情请查看[项目主页](https://tracking-any-granularity.github.io/)。
## 数据集概述
**任意粒度追踪(Tracking-Any-Granularity,简称TAG)**是一款专为训练我们的统一模型(命名为TAG)打造的综合性数据集,涵盖三类标注:分割掩码(segmentation masks)、边界框(bounding boxes)与关键点(key points)。
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00025.gif"/></td>
<td><img width="220" src="assets/data/00076.gif"/></td>
<td><img width="220" src="assets/data/00045.gif"/></td>
</tr>
</tbody>
</table>
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00102.gif"/></td>
<td><img width="220" src="assets/data/00103.gif"/></td>
<td><img width="220" src="assets/data/00152.gif"/></td>
</tr>
</tbody>
</table>
<table align="center">
<tbody>
<tr>
<td><img width="220" src="assets/data/00227.gif"/></td>
<td><img width="220" src="assets/data/00117.gif"/></td>
<td><img width="220" src="assets/data/00312.gif"/></td>
</tr>
</tbody>
</table>
## 数据集说明
本数据集涵盖**海量多样的视频源**,具备极强的场景多样性,可作为评估追踪性能的可靠基准。每个视频序列均标注了**18种代表不同追踪挑战的属性**,这些挑战可在同一视频中同时出现。常见挑战包括运动模糊、目标形变与部分遮挡,凸显了本数据集的高难度特性。多数视频包含多种属性,表明本数据集覆盖了复杂多样的追踪场景。

## 基准测试结果
我们在本数据集的验证集与测试集上对多款代表性追踪器进行了评估:
### 视频目标分割(video object segmentation)
| 模型名称 | 𝒥 & ℱ | 𝒥 | ℱ | 𝒥 & ℱ | 𝒥 | ℱ |
|-------------------------------|---------|---------|---------|---------|---------|---------|
| STCN | 70.4 | 65.9 | 75 | 76.2 | 72.2 | 80.2 |
| AOT-SwinB | 78.1 | 73.1 | 83.2 | 80.9 | 76.4 | 85.4 |
| DeAOT-SwinB | 79.6 | 74.8 | 84.4 | 81.6 | 77.3 | 85.9 |
| XMem | 74.4 | 70.1 | 78.6 | 75.7 | 71.8 | 79.6 |
| DEVA | 77.9 | 73.1 | 82.6 | 82.1 | 78.0 | 86.1 |
| Cutie-base+ | 79.0 | 75.0 | 83.0 | 83.8 | 80.0 | 87.7 |
| Cutie-base+ w/MEGA | 80.3 | 76.5 | 84.2 | 84.9 | 81.3 | 88.5 |
| OneVOS | 80.1 | 75.2 | 85.1 | 81 | 76.5 | 85.4 |
| OneVOS w/MOSE | 79.3 | 74.3 | 84.3 | 82.4 | 78 | 86.7 |
| JointFormer | 76.6 | 72.8 | 80.5 | 79.1 | 75.5 | 82.7 |
| SAM2++ | 87.4 | 84.2 | 90.7 | 87.9 | 84.9 | 90.9 |
### 单目标追踪(single object tracking)
| 模型名称 | AUC | P_Norm | P | AUC | P_Norm | P |
|------------------------------|---------|---------|---------|---------|---------|---------|
| OSTrack | 74.8 | 84.4 | 72.7 | 69.7 | 78.8 | 69.9 |
| SimTrack | 71.1 | 80.5 | 68.1 | 64.1 | 72.4 | 60.5 |
| MixViT w/ConvMAE | 72.1 | 80.9 | 70.5 | 69.7 | 78.2 | 70.2 |
| DropTrack | 76.8 | 86.9 | 74.4 | 71.1 | 80.5 | 72.1 |
| GRM | 73.1 | 82.3 | 71.4 | 69.1 | 77.4 | 69.1 |
| SeqTrack | 77.0 | 85.8 | 76.1 | 69.8 | 79.4 | 71.5 |
| ARTrack | 76.8 | 85.8 | 75.7 | 71.1 | 78.7 | 70.9 |
| ARTrack-V2 | 76.3 | 85.5 | 74.3 | 71.8 | 79.5 | 71.9 |
| ROMTrack | 75.6 | 85.4 | 73.7 | 71.3 | 80.8 | 72.8 |
| HIPTrack | 78.2 | 88.5 | 76.6 | 71.4 | 81 | 72.5 |
| LoRAT | 75.1 | 84.8 | 74.4 | 70.5 | 79.7 | 68.7 |
| SAM2++ | 80.7 | 89.7 | 77.8 | 78 | 85.7 | 81.5 |
### 点追踪(point tracking)
| 模型名称 | Acc | Acc |
|------------|---------|---------|
| pips | 19.0 | 19.8 |
| pips++ | 20.9 | 23.1 |
| CoTracker | 23.3 | 22.3 |
| CoTracker3 | 29.6 | 29.1 |
| TAPTR | 23.7 | 23.8 |
| TAPIR | 21.3 | 24.6 |
| LocoTrack | 25.2 | 30.2 |
| Track-On | 24.8 | 25.8 |
| SAM2++ | 35.3 | 37.7 |
## 数据集结构
<图像集(ImageSets)>
│
├── valid.txt
├── test.txt
<验证集/测试集压缩包(valid/test.tar.gz)>
│
├── 标注文件(Annotations)
│ │
│ ├── <视频名称_1>
│ │ ├── 00000.png
│ │ ├── 00001.png
│ │ └── ...
│ │
│ ├── <视频名称_2>
│ │ ├── 00000.png
│ │ ├── 00001.png
│ │ └── ...
│ │
│ ├── <视频名称_...>
│
├── 关键点文件(Points)
│ │
│ ├── <视频名称_1>.npz
│ ├── <视频名称_2>.npz
│ ├── <视频名称_...>.npz
│
├── 边界框文件(Boxes)
│ │
│ ├── <视频名称_1>.txt
│ ├── <视频名称_2>.txt
│ ├── <视频名称_...>.txt
│
├── 可见性标注(Visible)
│ │
│ ├── <视频名称_1>.txt
│ ├── <视频名称_2>.txt
│ ├── <视频名称_...>.txt
│
└── JPEG图像(JPEGImages)
│
├── <视频名称_1>
│ ├── 00000.jpg
│ ├── 00001.jpg
│ └── ...
│
├── <视频名称_2>
│ ├── 00000.jpg
│ ├── 00001.jpg
│ └── ...
│
└── <视频名称_...>
## BibTeX引用
若本数据集对您的研究有所助益,请引用我们的相关论文。
@article{zhang2025sam2trackinggranularity,
title={SAM 2++: Tracking Anything at Any Granularity},
author={Jiaming Zhang and Cheng Liang and Yichun Yang and Chenkai Zeng and Yutao Cui and Xinwen Zhang and Xin Zhou and Kai Ma and Gangshan Wu and Limin Wang},
journal={arXiv preprint arXiv:2510.18822},
url={https://arxiv.org/abs/2510.18822},
year={2025}
}
## 授权协议
Tracking-Any-Granularity数据集采用[知识共享署名4.0国际许可协议(Creative Commons license (CC-BY) 4.0 License)](https://creativecommons.org/licenses)进行授权。本数据集仅用于非商业性科研用途。
提供机构:
maas
创建时间:
2025-12-04



