Nasal Mucosa Cell Dataset (NMCD)
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/nasal-mucosa-cell-dataset-nmcd
下载链接
链接失效反馈官方服务:
资源简介:
Nasal Cytology, or Rhinology, is the subfield of otolaryngology, focused on the microscope observation of samples of the nasal mucosa, aimed to recognize cells of different types, to spot and diagnose ongoing pathologies. Such methodology can claim good accuracy in diagnosing rhinitis and infections, being very cheap and accessible without any instrument more complex than a microscope, even optical ones. Mucosa samples are taken non-invasively, just using a simple swab, to be then smeared onto a glass (fixation) and coloured with staining (in the case of the NMCD dataset the May-Grunwald-Giemsa) before being observed at the microscope.The construction of the NCD dataset is the result of intense work and collaboration between otolaryngologists and computer scientists who, convinced of the great contribution that artificial intelligence can make to this branch of medicine, decided to make material available to the scientific community to allow them to challenge and confront each other in this new application field. In this dataset 10 different entities are identified, that are distinguishable by some specific characteristic:Epithelial Cells: main components of nasal mucosa, constituting 80% of the observed cytotype in health patients. Their presence is not associated with ongoing pathologies.Ciliated cells: belonging to the epithelium cells family, these cells are characterized by their ”tailed-like” shape.Metaplastic cells: also belonging to the epithelium cells family, mataplastic cells are characterized by their round shape. Their presence is usually associated with ongoing inflammatory reaction.Muciparous: calciform mucous-secreting cells characterized by a bilobed shape with chromatin reinforced membrane. The increase of muciparous cells results in increased mucus production, a symptom of nasal pathologies with chronic trends, like, in example, dust mites allergies.Neutrophils: granulocytes with several nucleoli and a round shape. Their main function is the phagocytosis of germs. An increase in their number should always be kept under control as an immune response indicator.Eosinophils: polynuclear granulocytes, slightly large than neutrophils. The MGG staining tends to highlight the eosinophil grains within them in an orange color. Allergic diseases are associated with an increase in their population.Lymphocytes: white blood cells responsible for the immune response. Their large nucleus is surrounded by a thin cytoplasmatic ”light blue” rim.Mast-cells: large oval cells having their nuclei covered with basophil granules of intense color. Their presence in the nasal mucosa is caused by ongoing allergies.Ematia (Erythrocyte): red blood cells whose occurrence in rhinological specimen may be due to pathologies or previous internal nose wounds, or even to small blood losses during the smear process.Artifacts: with this name, are classified all objects with morphology similar to the one of a cell but not being onet. Examples of artifacts may be pollen pieces or dirt spots on the slide. Data were sampled from 14 rhinological slides collected at the Rhinology Clinic of the Otolaryngology Department of the University of Bari. Collecting technique was the direct smear and staining was the MGG. An optical microscope ProWay XSZPW208T with 1000x zoom, equipped with a 3MP DCE-PW300 camera was used to acquire 50 images (microscope fields) from each slide: this specific quantity has been chosen since it is the one defined in the rhino cytology protocol.Thus 700 images with a size of 1024×768 were obtained. The image annotations were created by experts, using the Roboflow platform, analyzing each image individually, annotating and labeling each cell. During such phase, a dropping policy was followed, discarding images where were detected:sampling noise (i.e. dirt on the slide or blurred photos)duplication of large cytoplasmic areas already present in other imagestoo dense and confused clusters of cells, typically discarded by nasal cytologist.A total of 200 cytological fields were pruned, ending up with 500 images. A Bounding Box (BB) was manually drawn on each cell in the images, to which a label was attached to specify the class the cell belonged to. Being cells generally round, the smallest rectangular area that enclose them was marked as their bounding box.It is hence possible to find overlaps between BBs in images, owed by the proximity between the cells and the rectangular structure of the box. Labeling operations produced more than 10,000 BBs corresponding to cells. Thanks to Roboflow, annotations were made available in any standard annotation format required for computer vision algorithms, like Pascal Voc, Coco, Tensorflow and Yolo. The 500 microscopic fields images were divided into training, validation and test set (80%-10%-10%) using the stratified holdout strategy to maintain the same class distribution within the three sets.
提供机构:
Camporeale, Mauro Giuseppe; Latrofa, Sergio; Iacobellis, Giorgia; Gelardi, Matteo; Lomonte, Nunzia; Dimauro, Giovanni; Ladisa, Mattia Sebastiano



