Coffee bean defect image dataset
收藏REDU2025-01-01 更新2026-05-11 收录
下载链接:
https://redu.unicamp.br/citation?persistentId=doi:10.25824/redu/8BNEOD
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 3672 individual images of coffee beans measuring 384x384 pixels. A balanced dataset with manually annotated images, following official standards, comprising nine total classes of coffee beans used for classification and quality analysis. Among them, two correspond to healthy beans and seven to beans with different types of defects. The bean images were obtained during the doctoral research of student Juliana Cardoso do Prado, carried out between 2017 and 2021, in her thesis entitled “Uso de técnicas tradicionais e computacionais para caracterização da qualidade de grãos de café” (DOI: https://doi.org/10.47749/T/UNICAMP.2021.1490251 ). The identified grain classes (labels 1 to 9 in the ‘grain_metadata.csv’ file) are: (1) Sound Bean, (2) Peaberry, (3) Black, (4) Immature, (5) Sour, (6) Shell, (7) Shell Center, (8) Broken, and (9) Insect-Damaged. To create this database, the raw coffee beans were arranged on a cut sheet of paper measuring 9 x 8 cm, totaling an area of 72 cm², forming a rectangle divided into 6 smaller rectangles measuring 4 x 3 cm = 12 cm². This sheet with the grains was placed against a dark background inside a box. The framing ensured the capture of the entire sheet against a dark background to allow for later correction. The grains were arranged in each small rectangle, thus obtaining images of 6 grains at a time. To obtain the images, a total of 1836 grains were used, that is, 204 grains from each of the 9 classes, with 7 classes having defects and 2 classes without defects. The images were acquired using a camera (Brand/model: Samsung/EK-GC200). The camera was placed on a tripod and the self-timer was used to eliminate camera shake during the capture process. The lighting was indirect natural light from a window, supplemented with a 6500 K LED lamp, and to reduce variations in lighting, all photos were taken between 2 pm and 4 pm. To avoid harsh shadows, the cameras flash was not used. The digital camera was set to manual focus, ISO 100, 3960 x 2640 pixel resolution, f/4 aperture, 1/4s exposure time, and was positioned 23 cm above the surface of the grains. To reduce systematic biases that could be improperly used by the classification model, the capture was done in an alternating fashion for the grain classes (example: 1 photo of 6 grains from class 1, 1 photo of 6 grains from class 2, and so on). The data preprocessing included identifying the trapezoid formed by the sheet of paper and rectifying it into a rectangle with the expected proportions (9x8). The second operation consisted of identifying the grains in each of the six cells and cutting them into 3672 individual images of 384x384 pixels. The third operation consisted of identifying the background pixels (excluding the grains) of the sheet of paper and using them to normalize the image, thus reducing variations in lighting and capture between images. The preprocessing was implemented in Python, with the aid of the OpenCV and Scikit-Image libraries. Thus, the data are annotated, with the faces of the grain indicated as: ventral face (face 1) and dorsal face (face 2) of the coffee beans (in the ‘grain_metadata.csv’ file).
提供机构:
. Instituto de Computação); . Faculdade de Engenharia Agrícola)
创建时间:
2025-01-01



