Clinical urine microscopy for urinary tract infections
收藏rodare.hzdr.de2025-03-24 收录
下载链接:
https://rodare.hzdr.de/record/2562
下载链接
链接失效反馈官方服务:
资源简介:
<p>Urinary tract infections (UTI) are a common disorder. Its diagnosis can be made by microscopic examination of voided urine for cellular markers of infection. We present a dataset containing 300 images and 3,562 manually annotated urinary cells labelled into seven classes of clinically significant urinary content. It is an enriched dataset with samples acquired from the unstained and untreated urine of patients with symptomatic UTI. The aim of the dataset is to facilitate UTI diagnosis in nearly all clinical settings by using a simple imaging system which leverages advanced machine learning techniques. </p>
<p><strong>Data acquisition </strong></p>
<p>300 urine samples were obtained from patients with symptomatic UTI between April and August 2022 from a specialist LUTS outpatient clinic in central London. Urine samples were collected as natural voids and processed on-site within one hour to mitigate cellular degradation. Brightfield microscopic examination (Olympus BX41F microscope frame, U-5RE quintuple nosepiece, U-LS30 LED illuminator, U-AC Abbe condenser) was performed at x20 objective (Olympus PLCN20x Plan C N Achromat 20x/0.4). A disposable haemocytometer (C Chip™) was used for enumeration of red cells (RBC), white cells (WBC), epithelial cells (EPC), and the presence of other cellular content per 1 µl of urine by two experienced microscopists.</p>
<p>Images were acquired using the aforementioned brightfield microscope using a 0.5X C-mount adapter connected to a digital colour camera (Infinity 3S-1UR, Teledyne Lumenera). Images were taken in 16-bit colour in 1392 x 1040 .tif format using Capture and Analyse software. An enriched dataset approach was taken to maximise urinary cellular content in the acquired images. Such data curation was also necessary to overcome class imbalance. Daily Kohler illumination and global white balance was performed to ensure consistency in image acquisition. </p>
<p><strong>Dataset annotation</strong></p>
<p>300 images were acquired and manually annotated by first identifying cells of interest as a binary semantic segmentation task. Individual pixels were dichotomously labelled as either informative cells, foreground, or non-informative background. Non-informative background was further constrained by including unidentifiable cells, such as debris or grossly out-of-focus particles. Binary annotation was initially performed using ilastik, an open-source software using a Random Forest classifier for pixel classification, then manually refined at the pixel level to ensure accurate semantic segmentation. This produced a binary mask in 1392 x 1040 .tif format for each corresponding raw colour image. </p>
<p>Objects of interest were then manually labelled by two expert microscopists into one of seven clinically significant multi-class categories: rods, RBC/WBC, yeast, miscellaneous, single EPC, small EPC sheet, and large EPC sheet. This produced a multi-class mask in 1392 x 1040 .tif format with a label as pixel value from 0-7, where 0 is background (Table 1). </p>
<p><strong>Data structure </strong></p>
<p>The dataset is organised into three root folders: img (image), bin_mask (binary mask), and mult_mask (multi-class mask). Each folder has 300 files in .tif format and labelled with an incremental number.</p>
<p><strong>Table1</strong></p>
<pre><code class="language-markdown">Folder Files Objects Count Pixel Values
img 300 Raw data 0-255
bin_mask 300 Background/Foreground 0/1
mult_mask 300 Background/Class 0
Rod 1697 1
RBC/WBC 1056 2
Yeast 41 3
Miscellaneous 550 4
Single EPC 182 5
Small EPC sheet 26 6
Large EPC sheet 10 7
Total 3562 </code></pre>
<p>泌尿道感染(UTI)是一种常见病症。其诊断可通过对排尿尿液中感染细胞标志物进行显微镜检查来完成。我们呈现了一个包含300张图像和3,562个手动标注的尿液细胞数据集,这些细胞被分为七个具有临床意义的尿液成分类别。这是一个富含样本的数据集,样本来自有症状的UTI患者的未染色且未经处理的尿液。本数据集的目的是通过利用先进的机器学习技术,通过一个简单的成像系统,在几乎所有临床环境中促进UTI的诊断。</p>
<p><strong>数据采集</strong></p>
<p>2022年4月至8月间,从位于伦敦中心的LUTS专科门诊收集了300份有症状的UTI患者的尿液样本。尿液样本以自然排尿方式收集,并在现场一小时之内进行处理,以减轻细胞降解。使用Brightfield显微镜(Olympus BX41F显微镜框架,U-5RE五孔物镜,U-LS30 LED照明器,U-AC Abbe聚光镜)进行x20倍物镜(Olympus PLCN20x计划C非色散20x/0.4)的显微检查。使用一次性血细胞计数板(C Chip™)由两位经验丰富的显微镜学家对每微升尿液中的红细胞(RBC)、白细胞(WBC)、上皮细胞(EPC)以及其他细胞成分进行计数。</p>
<p>图像使用上述Brightfield显微镜和0.5X C-mount适配器连接的数字彩色相机(Infinity 3S-1UR,Teledyne Lumenera)获取。使用Capture and Analyse软件以16位颜色在1392 x 1040 .tif格式下拍摄图像。采用富含数据集的方法,以最大化获取图像中的尿液细胞内容。此类数据整理也是必要的,以克服类别不平衡。每日进行Kohler照明和全局白平衡,以确保图像获取的一致性。</p>
<p><strong>数据集标注</strong></p>
<p>300张图像由两位专家显微镜学家手动获取并标注,首先将感兴趣的细胞识别为二元语义分割任务。单个像素被二分地标记为信息细胞、前景或非信息背景。非信息背景进一步通过包括无法识别的细胞,如碎片或严重失焦的颗粒来限制。最初使用ilastik(一个使用随机森林分类器进行像素分类的开源软件)进行二元标注,然后在像素级别进行手动细化,以确保准确的语义分割。这产生了1392 x 1040 .tif格式的二值掩码,对应于每个原始彩色图像。</p>
<p>然后,由两位专家显微镜学家将感兴趣的对象手动标注为七个具有临床意义的多元类别之一:杆状体、RBC/WBC、酵母、杂项、单个EPC、小型EPC片和小型EPC片。这产生了一个1392 x 1040 .tif格式的多元掩码,其标签为像素值从0到7,其中0是背景(表1)。</p>
<p><strong>数据结构</strong></p>
<p>该数据集组织为三个根文件夹:img(图像)、bin_mask(二值掩码)和mult_mask(多元掩码)。每个文件夹都有300个.tif格式的文件,并按递增编号标记。</p>
<p><strong>表1</strong></p>
<pre><code class="language-markdown">文件夹 文件 对象 数量 像素值
img 300 原始数据 0-255
bin_mask 300 背景/前景 0/1
mult_mask 300 背景/类别 0
杆状体 1697 1
RBC/WBC 1056 2
酵母 41 3
杂项 550 4
单个EPC 182 5
小型EPC片 26 6
大型EPC片 10 7
总计 3562 </code></pre>
提供机构:
rodare.hzdr.de



