Clinical urine microscopy for urinary tract infections
收藏rodare.hzdr.de2025-03-24 收录
下载链接:
https://rodare.hzdr.de/record/2473
下载链接
链接失效反馈官方服务:
资源简介:
<p>Urinary tract infections (UTI) are a common disorder. Its diagnosis can be made by microscopic examination of voided urine for cellular markers of infection. We present a dataset containing 300 images and 3,562 manually annotated urinary cells labelled into seven classes of clinically significant urinary content. It is an enriched dataset with samples acquired from the unstained and untreated urine of patients with symptomatic UTI. The aim of the dataset is to facilitate UTI diagnosis in nearly all clinical settings by using a simple imaging system which leverages advanced machine learning techniques. </p>
<p><strong>Data acquisition </strong></p>
<p>300 urine samples were obtained from patients with symptomatic UTI between April and August 2022 from a specialist LUTS outpatient clinic in central London. Urine samples were collected as natural voids and processed on-site within one hour to mitigate cellular degradation. Brightfield microscopic examination (Olympus BX41F microscope frame, U-5RE quintuple nosepiece, U-LS30 LED illuminator, U-AC Abbe condenser) was performed at x20 objective (Olympus PLCN20x Plan C N Achromat 20x/0.4). A disposable haemocytometer (C Chip™) was used for enumeration of red cells (RBC), white cells (WBC), epithelial cells (EPC), and the presence of other cellular content per 1 µl of urine by two experienced microscopists.</p>
<p>Images were acquired using the aforementioned brightfield microscope using a 0.5X C-mount adapter connected to a digital colour camera (Infinity 3S-1UR, Teledyne Lumenera). Images were taken in 16-bit colour in 1392 x 1040 .tif format using Capture and Analyse software. An enriched dataset approach was taken to maximise urinary cellular content in the acquired images. Such data curation was also necessary to overcome class imbalance. Daily Kohler illumination and global white balance was performed to ensure consistency in image acquisition. </p>
<p><strong>Dataset annotation</strong></p>
<p>300 images were acquired and manually annotated by first identifying cells of interest as a binary semantic segmentation task. Individual pixels were dichotomously labelled as either informative cells, foreground, or non-informative background. Non-informative background was further constrained by including unidentifiable cells, such as debris or grossly out-of-focus particles. Binary annotation was initially performed using ilastik, an open-source software using a Random Forest classifier for pixel classification, then manually refined at the pixel level to ensure accurate semantic segmentation. This produced a binary mask in 1392 x 1040 .tif format for each corresponding raw colour image. </p>
<p>Objects of interest were then manually labelled by two expert microscopists into one of seven clinically significant multi-class categories: rods, RBC/WBC, yeast, miscellaneous, single EPC, small EPC sheet, and large EPC sheet. This produced a multi-class mask in 1392 x 1040 .tif format with a label as pixel value from 0-7, where 0 is background (Table 1). </p>
<p><strong>Data structure </strong></p>
<p>The dataset is organised into three root folders: img (image), bin_mask (binary mask), and mult_mask (multi-class mask). Each folder has 300 files in .tif format and labelled with an incremental number.</p>
<p><strong>Table1</strong></p>
<pre><code class="language-markdown">Folder Files Objects Count Pixel Values
img 300 Raw data 0-255
bin_mask 300 Background/Foreground 0/1
mult_mask 300 Background/Class 0
Rod 1697 1
RBC/WBC 1056 2
Yeast 41 3
Miscellaneous 550 4
Single EPC 182 5
Small EPC sheet 26 6
Large EPC sheet 10 7
Total 3562 </code></pre>
<p>泌尿系统感染(UTI)是一种常见疾病。其诊断可通过对排尿尿液中感染细胞标志物的显微镜检查来完成。我们呈现了一个包含300张图像和3,562个手动标注的尿液细胞的数据集,这些细胞被分为七个具有临床意义的尿液内容类别。这是一个富含样本的数据集,样本来自患有症状性UTI患者的未染色和未经处理的尿液。该数据集的目的是通过利用先进的机器学习技术,通过简单的成像系统促进在几乎所有临床环境中进行UTI的诊断。</p>
<p><strong>数据采集</strong></p>
<p>在2022年4月至8月期间,从位于伦敦市中心的LUTS专科门诊收集了300份患有症状性UTI的患者的尿液样本。尿液样本以自然排尿的形式收集,并在现场小时内进行处理,以减轻细胞降解。使用Olympus BX41F显微镜框架、U-5RE五孔物镜、U-LS30 LED照明器和U-AC Abbe聚光镜进行了明场显微镜检查(Olympus PLCN20x Plan C N Achromat 20x/0.4倍物镜)。使用一次性血细胞计数器(C Chip™)由两位经验丰富的显微镜学家对每微升尿液中的红细胞(RBC)、白细胞(WBC)、上皮细胞(EPC)以及其他细胞内容的数量进行计数。</p>
<p>使用上述明场显微镜和0.5X C-mount适配器连接的数字彩色相机(Infinity 3S-1UR,Teledyne Lumenera)获取图像。使用Capture and Analyse软件以16位颜色在1392 x 1040 .tif格式拍摄图像。采用富含数据集的方法,以最大化获取的尿液细胞内容。此类数据整理对于克服类别不平衡也是必要的。每日进行Kohler照明和全局白平衡,以确保图像获取的一致性。</p>
<p><strong>数据集标注</strong></p>
<p>首先将感兴趣的细胞识别为二元语义分割任务,对300张图像进行了手动标注。单个像素被二分地标记为信息细胞、前景或非信息背景。非信息背景进一步通过包括无法识别的细胞,如碎片或严重失焦的颗粒来约束。最初使用ilastik(一个使用随机森林分类器进行像素分类的开源软件)进行二元标注,然后在像素级别手动细化以确保准确的语义分割。这为每个相应的原始彩色图像生成了一个1392 x 1040 .tif格式的二元掩码。</p>
<p>然后由两位专家显微镜学家将感兴趣的对象手动标记为七个具有临床意义的类别之一:杆状体、红细胞/白细胞、酵母、杂项、单个上皮细胞、小型上皮细胞层和大型上皮细胞层。这生成了一个1392 x 1040 .tif格式的多类掩码,其中标签为像素值,范围从0到7,其中0为背景(见表1)。</p>
<p><strong>数据结构</strong></p>
<p>该数据集组织为三个根文件夹:img(图像)、bin_mask(二元掩码)和mult_mask(多类掩码)。每个文件夹都有300个.tif格式的文件,并按递增编号标记。</p>
<p><strong>表1</strong></p>
<pre><code class="language-markdown">Folder Files Objects Count Pixel Values
img 300 原始数据 0-255
bin_mask 300 背景/前景 0/1
mult_mask 300 背景/类别 0
杆状体 1697 1
红细胞/白细胞 1056 2
酵母 41 3
杂项 550 4
单个上皮细胞 182 5
小型上皮细胞层 26 6
大型上皮细胞层 10 7
总计 3562 </code></pre>
提供机构:
rodare.hzdr.de



