busess/assignment-3
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/busess/assignment-3
下载链接
链接失效反馈官方服务:
资源简介:
# Assignment 3
### Tracking Videos
Generated tracked videos:
- `tracked_videos/drone_video_1_tracked.mp4`
- `tracked_videos/drone_video_2_tracked.mp4`
#### Video 1
[Watch `drone_video_1_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=Xjl6jR51P3I)
#### Video 2
[Watch `drone_video_2_tracked.mp4` on YouTube](https://www.youtube.com/watch?v=rLzJm5zPWD8)
## Dataset Choice And Detector Configuration
I selected `pathikg/drone-detection-dataset` from Hugging Face because it detects the drone itself rather than objects viewed from a drone, and it is already distributed in Parquet format. The local import script stores a raw snapshot under `data/raw_hf/`.
Task 1 processing is implemented in `process_videos.py`. It processes every `.mp4` in the requested input directory, extracts all frames, and writes only hit frames to `detections/`.
## Kalman Filter State Design And Noise Parameters
Task 2 tracking is implemented in `track_videos.py` with `filterpy`. The tracker state is a constant-velocity 2D state vector:
```text
x = [center_x, center_y, velocity_x, velocity_y]
```
The motion model uses the transition matrix:
```text
[[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]]
```
The measurement vector is the detector-provided bounding-box center:
```text
z = [center_x, center_y]
```
The main filter settings are:
- Measurement noise `R = [[25, 0], [0, 25]]`
- Initial covariance `P *= 250`
- Process noise from `Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)` applied to position and velocity blocks
- Missing-detection tolerance `max_missing = 10` frames by default
The tracker preserves the last known bounding-box width and height so it can still draw a predicted box when the detector temporarily misses the target.
## Failure Cases And Missed-Detection Handling
The biggest current failure mode is detector quality. Because the validation run used generic `yolov8n.pt`, many detections are labeled as `kite` or `airplane` instead of `drone`. The Kalman filter can smooth those measurements and bridge short gaps, but it cannot fix systematic detector misclassification.
When detections disappear briefly, the tracker calls `predict()` and keeps emitting estimated centers and boxes for up to `max_missing` consecutive frames. This is why the tracked output videos contain more frames than the raw detection counts. For example:
- `drone_video_1.mp4`: 2332 detection frames, 2797 tracked-output frames
- `drone_video_2.mp4`: 168 detection frames, 452 tracked-output frames
If the detector misses for longer than `max_missing`, the current track is dropped. A new track starts only when a new detection appears. This prevents unlimited drift but can fragment the trajectory if the detector loses the drone for too long.
Other observed risks:
- A wrong early detection can initialize the Kalman filter on the wrong object.
- Fast scale changes are only approximated because the filter tracks center and velocity, not width and height dynamics.
- Long occlusions or severe blur eventually exhaust the miss budget and terminate the track.
# 作业3
### 跟踪视频
生成的跟踪视频:
- `tracked_videos/drone_video_1_tracked.mp4`
- `tracked_videos/drone_video_2_tracked.mp4`
#### 视频1
[在YouTube上观看`drone_video_1_tracked.mp4`](https://www.youtube.com/watch?v=Xjl6jR51P3I)
#### 视频2
[在YouTube上观看`drone_video_2_tracked.mp4`](https://www.youtube.com/watch?v=rLzJm5zPWD8)
## 数据集选择与检测器配置
我从Hugging Face平台选取了`pathikg/drone-detection-dataset`数据集,原因在于该数据集旨在检测无人机(drone)本体,而非无人机视角下的其他物体,且其已采用Parquet格式进行分发。本地导入脚本会将原始快照存储至`data/raw_hf/`路径下。
任务1的处理逻辑在`process_videos.py`中实现,该脚本会处理指定输入目录下的所有`.mp4`文件,提取全部视频帧,并仅将包含检测结果的帧写入`detections/`路径。
## 卡尔曼滤波状态设计与噪声参数
任务2的跟踪逻辑基于`filterpy`库在`track_videos.py`中实现。跟踪器采用恒速二维状态向量作为状态表示:
text
x = [center_x, center_y, velocity_x, velocity_y]
其中,`center_x`与`center_y`分别为目标中心的横、纵坐标,`velocity_x`与`velocity_y`分别为x、y方向的运动速度。
运动模型采用如下状态转移矩阵:
text
[[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]]
测量向量为检测器提供的边界框(bounding box)中心坐标:
text
z = [center_x, center_y]
滤波器主要参数设置如下:
- 测量噪声协方差矩阵`R = [[25, 0], [0, 25]]`
- 初始协方差矩阵`P *= 250`
- 过程噪声通过`Q_discrete_white_noise(dim=2, dt=1.0, var=5.0)`生成,并作用于位置与速度分块
- 默认漏检容忍阈值`max_missing = 10`帧
跟踪器会保留最后一次检测到的边界框宽高,以便在检测器临时丢失目标时,仍可绘制预测得到的边界框。
## 失效场景与漏检处理机制
当前最主要的失效模式源于检测器性能局限。由于验证阶段使用了通用的`yolov8n.pt`模型,大量检测结果被误标记为风筝(kite)或飞机(airplane),而非无人机(drone)。卡尔曼滤波可对这些检测结果进行平滑处理并填补短时检测间隙,但无法修正检测器的系统性分类错误。
当检测结果短暂消失时,跟踪器会调用`predict()`函数,并在最多连续`max_missing`帧内持续输出估计的目标中心与边界框。这也是跟踪输出视频的帧数多于原始检测帧数的原因。例如:
- `drone_video_1.mp4`:原始检测帧共2332帧,跟踪输出帧共2797帧
- `drone_video_2.mp4`:原始检测帧共168帧,跟踪输出帧共452帧
若检测器连续漏检时长超过`max_missing`阈值,则会终止当前跟踪轨迹;仅当新的检测结果出现时,才会启动新的跟踪轨迹。该机制可避免跟踪结果无限制漂移,但如果检测器长时间丢失无人机目标,可能会导致轨迹被分割为多段。
其他已观测到的风险包括:
- 早期错误检测会导致卡尔曼滤波以错误的目标初始化跟踪
- 由于滤波器仅跟踪目标中心与速度,未建模边界框的宽高变化动态,因此仅能近似处理目标的快速尺度变化
- 长时间遮挡或严重模糊最终会耗尽漏检容忍额度,进而终止跟踪轨迹
提供机构:
busess



