PathOlOgics_RBCs Cells.zip
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/PathOlOgics_RBCs_Cells_zip/24119511
下载链接
链接失效反馈官方服务:
资源简介:
The root directory, "PathOlOgics_RBCs Cells", is systematically structured into three primary folders: "Cropped images”, "Masks”, and "Segmented images”. Within each of these primary folders, there are nine subfolders, meticulously dedicated to each RBCs class, encompassing the following counts of cells: "Angled cells: 24,187", "Borderline ovalocytes: 35,540”, "Burr cells: 8,948”, "Fragmented RBCs: 7,186”, "Ovalocytes: 55,348”, "Rounded RBCs: 46,346”, "Teardrops: 16,298”, "Three-overlapping RBCs: 15,577”, and "Two-overlapping RBCs: 31,360”. Each of the total 240,790 cells is represented by its own cropped image, mask, and segmented image, all subjected to meticulous processing and individual scrutiny by the haematologists directly, ensuring adherence to rigorous scientific standards. Samples for every class were collected from each slide/smear. The naming scheme for the cropped image, mask, and segmented image of every cell adheres to a consistent format, starting with the slide/smear number, followed by the unique patch/field number, and concluding with the (XYWH) coordination on the patch. All these images are conveniently stored in the ".jpg" format. To maintain image quality, ZIP compression was applied using the fastest compression method available.
PathOlOgics_RBCs datasets are freely accessible and have unique features and contributions when compared with other published datasets and published DL-based works. A quantitative advantage is evident, as there are over 240K RBCs images accompanied by their respective segmentation masks, spanning nine clinically significant classes. In terms of diversity, the datasets were created from 25 manually prepared peripheral blood and bone marrow smears of different patients; staining and scanning were performed using the same technique but with four different sources. The datasets are also of high quality because two expert haematologists with practical data science experience have processed and extensively reviewed the RBCs classes and masks cell-by-cell. The labelling criteria were crafted to emphasize clinically significant RBCs classes where visual examination is considered exclusive and unassisted by other technological solutions. Following individual labelling by haematologists, a final discussion was held to reconcile any labelling disparities. Each cell was assigned to one of nine classes, including normal/rounded RBCs, ovalocytes (oval or egg-shaped), borderline ovalocytes (between rounded and frank oval), burr cells (crenated), fragmented RBCs, teardrop-shaped RBCs, two-overlapped RBCs, three-overlapped RBCs, and other RBCs that contain artificial/false teardrops.
The presence of fragmented RBCs or teardrop-shaped RBCs is medically significant as it is commonly associated with serious medical conditions. Fragmented RBCs are defined as RBCs that are smaller than half the average normal/rounded RBCs size and/or irregularly shaped fragments with sharp, angular, or jagged edges. Identifying these cells is the most reliable indicator to confirm the diagnosis of diseases such as hemolytic anaemias, thrombotic thrombocytopenic purpura (TTP), and disseminated intravascular coagulation (DIC). However, reporting fragmented RBCs in TTP and DIC can be a challenge due to their infrequency in haematology labs; furthermore, the cutoff for significant presence in these two serious diseases is just above 1–1.5% of the total RBCs, increasing the risk of overlooking them. Crucially, in cases of critical thrombocytopenia where the platelet count is less than 20 K/µL, platelet transfusion may be necessary, but this intervention can be life-threatening in TTP and DIC. Therefore, identifying and counting fragmented RBCs could be critical for the accurate diagnosis and management of patients with associated medical conditions.
Increased teardrop-shaped RBCs above 2–4% in adults can be indicative of bone marrow fibrosis caused by bone marrow cancers, and in non-cancerous conditions, rushed erythropoiesis/production of blood to compensate for severe anaemia is the differential diagnosis. Currently, manual or DL-based visual examination is the only way to identify teardrop-shaped RBCs. It is essential to differentiate between true teardrop-shaped RBCs, which have a single blunt protrusion, and false ones that have sharp surface projections without necks or have more than one blunt protrusion. Mechanical stress during blood smear preparation often leads to the formation of false teardrop shapes, primarily at the outer edges of the blood film. In our datasets, we have created a separate class named "false teardrop-shaped RBCs/angled RBCs." This class contained numerous RBCs that exhibited similarities to fragmented RBCs, ovalocytes, and teardrop-shaped RBCs but were actually false representations of these classes. This precise classification has not been previously utilized in datasets or research using DL methods.
Ovalocytes are a type of RBCs that have an abnormal oval shape. The presence of ovalocytes exceeding 5–10% of the total RBCs is associated with almost all types of anaemia or erythrocytosis. They may display elongation and/or a pear shape, but without any blunt or sharp surface protrusions. Occasionally, they can also appear in normal blood smears due to mechanical deformation during preparation, though at a low frequency. Currently, identifying ovalocytes through manual/visual methods is subjective, and there is no alternative to manual or DL-based visual examination. This inherent subjectivity and absence of automated measures might account for the broad cutoff range (above 5–10%) observed in cases of anaemia or erythrocytosis. To address this issue, we utilized aspect ratio calculations to assist in objectively differentiating between normal/rounded RBCs, borderline ovalocytes (those on the borderline between rounded and ovalocytes), and frank ovalocytes. This approach may enable precise determinations of specific cutoffs for ovalocytes in different conditions.
To ensure the development of a dependable DL-based classifier for RBCs and to prevent potential errors in analysis, the training datasets should encompass other types of RBCs that possess multiple projections, share similarities, and may be mistakenly identified as teardrop-shaped RBCs. These types include burr cells, which have uneven surfaces with several small notches and protrusions. Likewise, no technological substitute currently exists for the visual recognition of burr cells, which tend to elevate under conditions of dehydration, such as in cases of renal failure or dehydrated neonates. Alternatively, in situations lacking medical justification, the presence of burr cells may arise due to the extended drying of smears during the manual staining procedure.
For accurate reporting of RBCs morphology, it is necessary to count at least 1,000 RBCs in appropriate fields to determine the percentage of each abnormal cell type among the total RBCs. However, identification of overlapping RBCs can be challenging given that upper cells might mask crucial parts of the overlapped cells, leading to potential misclassification of the overlapped. Therefore, there is no need to assume or predict the actual types of overlapping RBCs, they should be excluded from the individual cells counting. They were included in separate classes just to enable the classifiers to differentiate individual cells from them (junk classes) and also to help determine fields that are suitable for counting.
The aim of creating these datasets was to facilitate the development of a generalizable automated system for RBCs classification utilizing the current DL technology or any other advanced technologies that may evolve in the coming decades. This pipeline should be capable of effectively operating on the commonly used manually prepared and stained blood smears without requiring prior standardization of the staining or smearing procedures. This adaptability allows the system to function as a highly sensitive screening tool for anaemia, utilizing the proportion of ovalocytes relative to the total RBCs count. Simultaneously, the system excels at accurately identifying teardrop-shaped and fragmented RBCs, thus ensuring specificity. Additionally, the system can identify some instances of improper manual staining by analyzing the presence of burr cells. Moreover, the system can intelligently identify optimal regions to begin cell identification and counting by assessing the ratio of overlapping RBCs to individual RBCs.
创建时间:
2023-09-13



