five

busess/assignment-3

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/busess/assignment-3
下载链接
链接失效反馈
官方服务:
资源简介:
# Assignment 3 ### Tracking Videos Generated tracked videos: - `tracked_videos/drone_video_1_tracked.mp4` - `tracked_videos/drone_video_2_tracked.mp4` #### Video 1 [Watch `drone_video_1_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=Xjl6jR51P3I) #### Video 2 [Watch `drone_video_2_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=rLzJm5zPWD8) ## Dataset Choice And Detector Configuration I selected `pathikg/drone-detection-dataset` from Hugging Face because it detects the drone itself rather than objects viewed from a drone, and it is already distributed in Parquet format. The local import script stores a raw snapshot under `data/raw_hf/`. Task 1 processing is implemented in `process_videos.py`. It processes every `.mp4` in the requested input directory, extracts all frames, and writes only hit frames to `detections/`. ## Kalman Filter State Design And Noise Parameters Task 2 tracking is implemented in `track_videos.py` with `filterpy`. The tracker state is a constant-velocity 2D state vector: ```text x = [center_x, center_y, velocity_x, velocity_y] ``` The motion model uses the transition matrix: ```text [[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]] ``` The measurement vector is the detector-provided bounding-box center: ```text z = [center_x, center_y] ``` The main filter settings are: - Measurement noise `R = [[25, 0], [0, 25]]` - Initial covariance `P *= 250` - Process noise from `Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)` applied to position and velocity blocks - Missing-detection tolerance `max_missing = 10` frames by default The tracker preserves the last known bounding-box width and height so it can still draw a predicted box when the detector temporarily misses the target. ## Failure Cases And Missed-Detection Handling The biggest current failure mode is detector quality. Because the validation run used generic `yolov8n.pt`, many detections are labeled as `kite` or `airplane` instead of `drone`. The Kalman filter can smooth those measurements and bridge short gaps, but it cannot fix systematic detector misclassification. When detections disappear briefly, the tracker calls `predict()` and keeps emitting estimated centers and boxes for up to `max_missing` consecutive frames. This is why the tracked output videos contain more frames than the raw detection counts. For example: - `drone_video_1.mp4`: 2332 detection frames, 2797 tracked-output frames - `drone_video_2.mp4`: 168 detection frames, 452 tracked-output frames If the detector misses for longer than `max_missing`, the current track is dropped. A new track starts only when a new detection appears. This prevents unlimited drift but can fragment the trajectory if the detector loses the drone for too long. Other observed risks: - A wrong early detection can initialize the Kalman filter on the wrong object. - Fast scale changes are only approximated because the filter tracks center and velocity, not width and height dynamics. - Long occlusions or severe blur eventually exhaust the miss budget and terminate the track.

# 作业3 ### 跟踪视频 生成的跟踪视频: - `tracked_videos/drone_video_1_tracked.mp4` - `tracked_videos/drone_video_2_tracked.mp4` #### 视频1 [在YouTube上观看`drone_video_1_tracked.mp4`](https://www.youtube.com/watch?v=Xjl6jR51P3I) #### 视频2 [在YouTube上观看`drone_video_2_tracked.mp4`](https://www.youtube.com/watch?v=rLzJm5zPWD8) ## 数据集选择与检测器配置 我从Hugging Face平台选取了`pathikg/drone-detection-dataset`数据集,原因在于该数据集旨在检测无人机(drone)本体,而非无人机视角下的其他物体,且其已采用Parquet格式进行分发。本地导入脚本会将原始快照存储至`data/raw_hf/`路径下。 任务1的处理逻辑在`process_videos.py`中实现,该脚本会处理指定输入目录下的所有`.mp4`文件,提取全部视频帧,并仅将包含检测结果的帧写入`detections/`路径。 ## 卡尔曼滤波状态设计与噪声参数 任务2的跟踪逻辑基于`filterpy`库在`track_videos.py`中实现。跟踪器采用恒速二维状态向量作为状态表示: text x = [center_x, center_y, velocity_x, velocity_y] 其中,`center_x`与`center_y`分别为目标中心的横、纵坐标,`velocity_x`与`velocity_y`分别为x、y方向的运动速度。 运动模型采用如下状态转移矩阵: text [[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]] 测量向量为检测器提供的边界框(bounding box)中心坐标: text z = [center_x, center_y] 滤波器主要参数设置如下: - 测量噪声协方差矩阵`R = [[25, 0], [0, 25]]` - 初始协方差矩阵`P *= 250` - 过程噪声通过`Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)`生成,并作用于位置与速度分块 - 默认漏检容忍阈值`max_missing = 10`帧 跟踪器会保留最后一次检测到的边界框宽高,以便在检测器临时丢失目标时,仍可绘制预测得到的边界框。 ## 失效场景与漏检处理机制 当前最主要的失效模式源于检测器性能局限。由于验证阶段使用了通用的`yolov8n.pt`模型,大量检测结果被误标记为风筝(kite)或飞机(airplane),而非无人机(drone)。卡尔曼滤波可对这些检测结果进行平滑处理并填补短时检测间隙,但无法修正检测器的系统性分类错误。 当检测结果短暂消失时,跟踪器会调用`predict()`函数,并在最多连续`max_missing`帧内持续输出估计的目标中心与边界框。这也是跟踪输出视频的帧数多于原始检测帧数的原因。例如: - `drone_video_1.mp4`:原始检测帧共2332帧,跟踪输出帧共2797帧 - `drone_video_2.mp4`:原始检测帧共168帧,跟踪输出帧共452帧 若检测器连续漏检时长超过`max_missing`阈值,则会终止当前跟踪轨迹;仅当新的检测结果出现时,才会启动新的跟踪轨迹。该机制可避免跟踪结果无限制漂移,但如果检测器长时间丢失无人机目标,可能会导致轨迹被分割为多段。 其他已观测到的风险包括: - 早期错误检测会导致卡尔曼滤波以错误的目标初始化跟踪 - 由于滤波器仅跟踪目标中心与速度,未建模边界框的宽高变化动态,因此仅能近似处理目标的快速尺度变化 - 长时间遮挡或严重模糊最终会耗尽漏检容忍额度,进而终止跟踪轨迹
提供机构:
busess
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作