FOCAL Dataset: Ford-OLIVES Collaboration on Active Learning

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10145323

下载链接

链接失效反馈

官方服务：

资源简介：

In this dataset, we introduce the FOCAL (Ford-OLIVES Collaboration on Active Learning) dataset which enables the study of the impact of annotation-cost within a video active learning setting. Annotation-cost refers to the time it takes an annotator to label and quality-assure a given video sequence. A practical motivation for active learning research is to minimize annotation-cost by selectively labeling informative samples that will maximize performance within a given budget constraint. However, previous work in video active learning lacks real-time annotation labels for accurately assessing cost minimization and instead operates under the assumption that annotation-cost scales linearly with the amount of data to annotate. This assumption does not take into account a variety of real-world confounding factors that contribute to a nonlinear cost such as the effect of an assistive labeling tool and the variety of interactions within a scene such as occluded objects, weather, and motion of objects. FOCAL addresses this discrepancy by providing real annotation-cost labels for 126 video sequences across 69 unique city scenes with a variety of weather, lighting, and seasonal conditions. These videos have a wide range of interactions that are at the intersection of infrastructure-assisted autonomy and autonomous vehicle communities. We show through a statistical analysis of the FOCAL dataset that cost is more correlated with a variety of factors beyond just the length of a video sequence. We also introduce a set of conformal active learning algorithms that take advantage of the sequential structure of video data in order to achieve a better trade-off between annotation-cost and performance while also reducing floating point operations (FLOPS) overhead by at least 77.67%. We show how these approaches better reflect how annotations on videos are done in practice through a sequence selection framework. We further demonstrate the advantage of these approaches by introducing two performance-cost metrics and show that the best conformal active learning method is cheaper than the best traditional active learning method by 113 hours. This work took place at the OLIVES Lab @ Georgia Tech. The codebase associated with this work can be found at this Github. Please refer to our lab-wide github for more information regarding the code associated with our other papers.

本数据集推出了FOCAL数据集（Ford-OLIVES Collaboration on Active Learning），该数据集可用于研究视频主动学习场景下标注成本的影响。标注成本指的是标注人员对给定视频序列进行标注与质量审核所需耗费的时长。主动学习研究的一个实际目标，是在给定预算约束下，通过选择性标注能够最大化模型性能的信息样本，从而降低标注成本。然而，现有视频主动学习相关研究缺乏用于精准评估成本优化的实际标注成本数据，而是默认标注成本与待标注数据量呈线性比例关系。该假设未考虑多种会导致成本呈非线性变化的现实混杂因素，例如辅助标注工具的影响，以及场景内遮挡物体、天气、物体运动等各类交互情况。FOCAL数据集通过提供覆盖69个独特城市场景、涵盖多种天气、光照与季节条件的126个视频序列的实际标注成本标签，解决了这一偏差。这些视频包含了基础设施辅助自动驾驶与自动驾驶汽车领域交叉场景下的各类交互场景。通过对FOCAL数据集的统计分析，我们证实标注成本与多种因素相关，而非仅与视频序列长度有关。我们还提出了一组利用视频数据时序结构的保形主动学习算法，在标注成本与模型性能之间实现更优权衡的同时，将浮点运算量（FLOPS）开销至少降低77.67%。我们通过序列选择框架，证实了这些方法能够更贴合视频标注的实际流程。我们还引入了两种性能-成本评估指标，进一步验证了这些方法的优势，并证实最优保形主动学习方法比最优传统主动学习方法的标注成本低113小时。本研究工作由佐治亚理工学院OLIVES实验室完成。本研究相关的代码库可在该GitHub仓库获取。如需了解我们其他论文相关代码的更多信息，请访问本实验室的GitHub主页。

创建时间：

2023-12-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集