busess/assignment-3

Name: busess/assignment-3
Creator: busess
Published: 2026-03-23 03:59:43
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/busess/assignment-3

下载链接

链接失效反馈

官方服务：

资源简介：

# Assignment 3 ### Tracking Videos Generated tracked videos: - `tracked_videos/drone_video_1_tracked.mp4` - `tracked_videos/drone_video_2_tracked.mp4` #### Video 1 [Watch `drone_video_1_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=Xjl6jR51P3I) #### Video 2 [Watch `drone_video_2_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=rLzJm5zPWD8) ## Dataset Choice And Detector Configuration I selected `pathikg/drone-detection-dataset` from Hugging Face because it detects the drone itself rather than objects viewed from a drone, and it is already distributed in Parquet format. The local import script stores a raw snapshot under `data/raw_hf/`. Task 1 processing is implemented in `process_videos.py`. It processes every `.mp4` in the requested input directory, extracts all frames, and writes only hit frames to `detections/`. ## Kalman Filter State Design And Noise Parameters Task 2 tracking is implemented in `track_videos.py` with `filterpy`. The tracker state is a constant-velocity 2D state vector: ```text x = [center_x, center_y, velocity_x, velocity_y] ``` The motion model uses the transition matrix: ```text [[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]] ``` The measurement vector is the detector-provided bounding-box center: ```text z = [center_x, center_y] ``` The main filter settings are: - Measurement noise `R = [[25, 0], [0, 25]]` - Initial covariance `P *= 250` - Process noise from `Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)` applied to position and velocity blocks - Missing-detection tolerance `max_missing = 10` frames by default The tracker preserves the last known bounding-box width and height so it can still draw a predicted box when the detector temporarily misses the target. ## Failure Cases And Missed-Detection Handling The biggest current failure mode is detector quality. Because the validation run used generic `yolov8n.pt`, many detections are labeled as `kite` or `airplane` instead of `drone`. The Kalman filter can smooth those measurements and bridge short gaps, but it cannot fix systematic detector misclassification. When detections disappear briefly, the tracker calls `predict()` and keeps emitting estimated centers and boxes for up to `max_missing` consecutive frames. This is why the tracked output videos contain more frames than the raw detection counts. For example: - `drone_video_1.mp4`: 2332 detection frames, 2797 tracked-output frames - `drone_video_2.mp4`: 168 detection frames, 452 tracked-output frames If the detector misses for longer than `max_missing`, the current track is dropped. A new track starts only when a new detection appears. This prevents unlimited drift but can fragment the trajectory if the detector loses the drone for too long. Other observed risks: - A wrong early detection can initialize the Kalman filter on the wrong object. - Fast scale changes are only approximated because the filter tracks center and velocity, not width and height dynamics. - Long occlusions or severe blur eventually exhaust the miss budget and terminate the track.

# 作业3 ### 跟踪视频生成的跟踪视频： - `tracked_videos/drone_video_1_tracked.mp4` - `tracked_videos/drone_video_2_tracked.mp4` #### 视频1 [在YouTube上观看`drone_video_1_tracked.mp4`](https://www.youtube.com/watch?v=Xjl6jR51P3I) #### 视频2 [在YouTube上观看`drone_video_2_tracked.mp4`](https://www.youtube.com/watch?v=rLzJm5zPWD8) ## 数据集选择与检测器配置我从Hugging Face平台选取了`pathikg/drone-detection-dataset`数据集，原因在于该数据集旨在检测无人机（drone）本体，而非无人机视角下的其他物体，且其已采用Parquet格式进行分发。本地导入脚本会将原始快照存储至`data/raw_hf/`路径下。任务1的处理逻辑在`process_videos.py`中实现，该脚本会处理指定输入目录下的所有`.mp4`文件，提取全部视频帧，并仅将包含检测结果的帧写入`detections/`路径。 ## 卡尔曼滤波状态设计与噪声参数任务2的跟踪逻辑基于`filterpy`库在`track_videos.py`中实现。跟踪器采用恒速二维状态向量作为状态表示： text x = [center_x, center_y, velocity_x, velocity_y] 其中，`center_x`与`center_y`分别为目标中心的横、纵坐标，`velocity_x`与`velocity_y`分别为x、y方向的运动速度。运动模型采用如下状态转移矩阵： text [[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]] 测量向量为检测器提供的边界框（bounding box）中心坐标： text z = [center_x, center_y] 滤波器主要参数设置如下： - 测量噪声协方差矩阵`R = [[25, 0], [0, 25]]` - 初始协方差矩阵`P *= 250` - 过程噪声通过`Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)`生成，并作用于位置与速度分块 - 默认漏检容忍阈值`max_missing = 10`帧跟踪器会保留最后一次检测到的边界框宽高，以便在检测器临时丢失目标时，仍可绘制预测得到的边界框。 ## 失效场景与漏检处理机制当前最主要的失效模式源于检测器性能局限。由于验证阶段使用了通用的`yolov8n.pt`模型，大量检测结果被误标记为风筝（kite）或飞机（airplane），而非无人机（drone）。卡尔曼滤波可对这些检测结果进行平滑处理并填补短时检测间隙，但无法修正检测器的系统性分类错误。当检测结果短暂消失时，跟踪器会调用`predict()`函数，并在最多连续`max_missing`帧内持续输出估计的目标中心与边界框。这也是跟踪输出视频的帧数多于原始检测帧数的原因。例如： - `drone_video_1.mp4`：原始检测帧共2332帧，跟踪输出帧共2797帧 - `drone_video_2.mp4`：原始检测帧共168帧，跟踪输出帧共452帧若检测器连续漏检时长超过`max_missing`阈值，则会终止当前跟踪轨迹；仅当新的检测结果出现时，才会启动新的跟踪轨迹。该机制可避免跟踪结果无限制漂移，但如果检测器长时间丢失无人机目标，可能会导致轨迹被分割为多段。其他已观测到的风险包括： - 早期错误检测会导致卡尔曼滤波以错误的目标初始化跟踪 - 由于滤波器仅跟踪目标中心与速度，未建模边界框的宽高变化动态，因此仅能近似处理目标的快速尺度变化 - 长时间遮挡或严重模糊最终会耗尽漏检容忍额度，进而终止跟踪轨迹

提供机构：

busess

5,000+

优质数据集

54 个

任务类型

进入经典数据集