A Comprehensive Raw Dataset of Ziehl-Neelsen Stained Sputum Smear Microscopy Images for Mycobacterium Tuberculosis Detection
收藏DataCite Commons2026-04-29 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/34gymtj5yc
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents a comprehensive collection of 1,438 raw sputum smear microscopy images containing 11,447 manually annotated ground-truth bounding box labels for the detection of Mycobacterium tuberculosis. The clinical specimens were sourced from Dr. Mohamad Soewandhi Regional General Hospital and Surabaya Pulmonary Hospital, Indonesia. To ensure high natural variability and prevent sensor bias, the images were acquired using standard optical microscopes equipped with two distinct digital camera systems: Hayear and Optilab.
To address common challenges in microscopic imaging such as uneven background illumination, dust artifacts, and spatial noise, this dataset intentionally preserves the raw, unmodified characteristics of the stained slides to represent true clinical environments. Instead of relying on computational enhancements, a detailed metadata.csv file is provided. This file categorizes each image based on its dominant background color condition (e.g., Greenish, Bluish, Purplish/Pinkish, Yellowish) resulting from natural staining thickness and differing camera sensor responses.
Data Structure & Format:
The raw images are provided in .jpg and .png formats with a uniform high spatial resolution of 1,437 × 1,079 pixels. The dataset is systematically partitioned into 'train', 'val' (validation), and 'test' subdirectories to facilitate immediate machine learning model training. All image annotations are natively provided as plain text files (.txt) strictly adhering to the standard YOLO bounding box format (normalized coordinates: class_id x_center y_center width height). The object class ID for Mycobacterium tuberculosis is designated as 0.
Potential Use Cases:
Researchers and developers in computer vision and healthcare diagnostics can utilize this dataset to build, benchmark, and improve object detection algorithms (such as the YOLO family) for automated tuberculosis screening. Furthermore, the explicit inclusion of diverse camera sources (Hayear and Optilab) and detailed color metadata serves as a ready-to-use resource for evaluating model robustness and generalizability across varying real-world microscopic imaging conditions.
提供机构:
Mendeley Data
创建时间:
2026-03-30



