five

A Comprehensive Raw Dataset of Ziehl-Neelsen Stained Sputum Smear Microscopy Images for Mycobacterium Tuberculosis Detection

收藏
DataCite Commons2026-04-28 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/34gymtj5yc/6
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset presents a comprehensive collection of 1,438 raw sputum smear microscopy images containing 11,447 manually annotated ground-truth bounding box labels for the detection of Mycobacterium tuberculosis. The clinical specimens were sourced from Dr. Mohamad Soewandhi Regional General Hospital and Surabaya Pulmonary Hospital, Indonesia. To ensure high natural variability and prevent sensor bias, the images were acquired using standard optical microscopes equipped with two distinct digital camera systems: Hayear and Optilab. To address common challenges in microscopic imaging—such as uneven background illumination, dust artifacts, and spatial noise—this dataset intentionally preserves the raw, unmodified characteristics of the stained slides to represent true clinical environments. Instead of relying on computational enhancements, a detailed metadata.csv file is provided. This file categorizes each image based on its dominant background color condition (e.g., Greenish, Bluish, Purplish/Pinkish, Yellowish) resulting from natural staining thickness and differing camera sensor responses. Data Structure & Format: The dataset is systematically partitioned into 'train', 'val' (validation), and 'test' subdirectories to facilitate immediate machine learning model training. All image annotations are natively provided as plain text files (.txt) strictly adhering to the standard YOLO bounding box format (normalized coordinates: class_id x_center y_center width height). The object class ID for Mycobacterium tuberculosis is designated as 0. Potential Use Cases: Researchers and developers in computer vision and healthcare diagnostics can utilize this dataset to build, benchmark, and improve object detection algorithms (such as the YOLO family) for automated tuberculosis screening. Furthermore, the explicit inclusion of diverse camera sources (Hayear and Optilab) and detailed color metadata serves as a ready-to-use resource for evaluating model robustness and generalizability across varying real-world microscopic imaging conditions.
提供机构:
Mendeley Data
创建时间:
2026-04-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作