anttwon/detections_frames

Name: anttwon/detections_frames
Creator: anttwon
Published: 2026-03-22 22:41:10
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/anttwon/detections_frames

下载链接

链接失效反馈

官方服务：

资源简介：

Drone tracking videos: 1:https://www.youtube.com/watch?v=POT-g-hS6v0 2:https://www.youtube.com/watch?v=dTMicLlHDOI Dataset used: https://www.kaggle.com/datasets/dasmehdixtr/drone-dataset-uav/data This dataset includes the detections for each frame of the provided videos. For this task, I decided to utilize the RTDETR model. I chose this over the YOLO model because, honestly I wanted to try something new, and experiment with its transformer-based object detector. Due to the limited dataset, YOLO might have been better. Rtdetr might be better at distinguishing the drone if there was occulsion of the drone, but otherwise YOLO might be better for this smaller dataset, and for these videos specifically in which the drones are matched against the sky. I trained the model on the dataset above for 81 epochs with image sizes of 640 x 640. For this training, I decided to utilize a patience value of 20, so while it was initialy supposed to go through 100 training passes, it ended up only getting to 81. I did this to combat the risk of overfitting that was present due to my model choice and dataset size.I also used a dropout value of 0.1 for the same reason. For the detector itself, I chose to predict using frames of size 1280 x 1280 to help with the model's precision and accuracy. The Kalman filter's state vector was taken from the bounding boxes of each detected frame from the detector. The center of each bounding box was computed using the formulas cx = (x_min + x_max) / 2 and cy = (y_min + y_max) / 2, forming the measurement vector [cx, cy]. These center coordinates were passed to kf.update() for each frame as the observed position. The state vector [cx, cy, vx, vy] was then estimated by the filter, with position taken directly from the measurements and velocity inferred implicitly from how the center moved between frames. This continued for each frame, and then for each video. The filter was initialized using the first detected frame, with 0 initial velocity. If no detection is found in the provided opening frames, the tracker is not initialized until the first bounding box is safely produced. When no drone is found in a frame, the tracker calls kf.predict() and uses the drone's last known trajectory to estimate the position where it should be. After 10 frames of no detections, the tracker resets, assuming that it's "estimations" were wrong, leaving those frames untracked and absent from the output. If the drone later reemmerges, the tracker then reinitialises from the new detection point with zero velocity. As for the noise parameters, R is the measurement noise, and I set it to [25,25], because of my higher confidence interval (0.7 for the first video, 0.5 for the second). Q is the process noise, and its value is [15,15,120,120]. I gave it a higher velocity to account for the drone's sudden swings in motion. It's position values are low because despite it's velocity, it shouldn't jump too much across frames. P is the initial state uncertainty, defined as [500, 500, 1000, 1000]. It's high initially due to the lack of information, but lowers over the course of the tracker's usage.

提供机构：

anttwon

5,000+

优质数据集

54 个

任务类型

进入经典数据集