Dataset for Insect Detection Remote Sensing

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10055762

下载链接

链接失效反馈

官方服务：

资源简介：

# MSU Horticulture farm beehive dataset Dataset associated with journal submission entitled "Comparison of Supervised Learning and Changepoint Detection for Insect Detection in Lidar Data" by authors T. C. Vannoy, N. B. Sweeney, J. A. Shaw, and B. M. Whitaker. The associated software is archived at https://zenodo.org/doi/10.5281/zenodo.10055809. The data were collected in June and July of 2022 at the horticulture farm at Montana State University - Bozeman. The data consists of 9977 images taken in front of the beehives. For the data collection process, the lidar was mounted in the back of a U-haul van and pointed in front of the beehives. The lidar was then run at a variety of pan and tilt angles. This created a diverse set of images with varying levels of activity depending on how far the beam was from the beehives, along with some of the sets of images containing stationary targets where the beam was hitting a beehive or plant in the distance. ## Organization At the top-level, data are split by the collection date. The next level down, the folders correspond to individual data collection runs; the timestamp at the end of the folder names indicates when the data collection started. Each top-level date folder contains a README file that describes the individual data collection runs. Each data collection folder contains the following files: - `adjusted_data_junecal_volts.mat`: The main data file, which contains all the data and metadata. - `_P_T.mat`: The individual images; these files contain the raw data. These are all combined into `adjusted_data_junecal_volts.mat`, so these files are rarely used. - `labels.csv`: The class labels for all data in the folder - `labels.mat`: The class labels converted into label vectors, which are used for machine learning. ## Class labels After collecting the data, we manually labelled the bounding box of each insect in each image, then converted the bounding boxes into binary labels that indicate whether a row contains an insect. Each potential insect was labeled with a confidence rating because some bees were more obvious than others. During the labeling process, we found 4671 probable bees. Since we were not able to collect ground-truth data in the field, it is possible that our labels are imperfect: some insects might have been missed, and some non-insects might have been labeled as insects. There is a `labels.csv` file in the root directory, which contains all the labels. Each of the data-collection subdirectories has a `labels.csv` and `labels.mat` file that contain the labels for that run. ### Class imbalance Of the 9977 images, 3498 (35.14\%) contain one or more bees. In total, the dataset has 1775906 rows, 11492 (0.647\%) of which contain an insect measurement. Due to sampling jitter in the ADC, most insects span multiple range bins, leading to an increase in the number of rows that were labeled as containing insects. The dataset has a large class imbalance, particularly when looking at how many rows contain insects.

创建时间：

2024-02-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集