five

Clinical urine microscopy for urinary tract infections

收藏
rodare.hzdr.de2025-03-24 收录
下载链接:
https://rodare.hzdr.de/record/2563
下载链接
链接失效反馈
官方服务:
资源简介:
<p>Urinary tract infections (UTI) are a common disorder. Its diagnosis can be made by microscopic examination of voided urine for cellular markers of infection. We present a dataset containing 300 images and 3,562 manually annotated urinary cells labelled into seven classes of clinically significant urinary content. It is an enriched dataset with samples acquired from the unstained and untreated urine of patients with symptomatic UTI. The aim of the dataset is to facilitate UTI diagnosis in nearly all clinical settings by using a simple imaging system which leverages advanced machine learning techniques.&nbsp;</p> <p><strong>Data acquisition&nbsp;</strong></p> <p>300 urine samples were obtained from patients with symptomatic UTI between April and August 2022 from a specialist LUTS outpatient clinic in central London. Urine samples were collected as natural voids and processed on-site within one hour to mitigate cellular degradation. Brightfield microscopic examination (Olympus BX41F microscope frame, U-5RE quintuple nosepiece, U-LS30 LED illuminator, U-AC Abbe condenser) was performed at x20 objective (Olympus PLCN20x Plan C N Achromat 20x/0.4). A disposable haemocytometer (C Chip&trade;) was used for enumeration of red cells (RBC), white cells (WBC), epithelial cells (EPC), and the presence of other cellular content per 1 &micro;l of urine by two experienced microscopists.</p> <p>Images were acquired using the aforementioned brightfield microscope using a 0.5X C-mount adapter connected to a digital colour camera (Infinity 3S-1UR, Teledyne Lumenera). Images were taken in 16-bit colour in 1392 x 1040 .tif format using Capture and Analyse software. An enriched dataset approach was taken to maximise urinary cellular content in the acquired images. Such data curation was also necessary to overcome class imbalance. Daily Kohler illumination and global white balance was performed to ensure consistency in image acquisition.&nbsp;</p> <p><strong>Dataset annotation</strong></p> <p>300 images were acquired and manually annotated by first identifying cells of interest as a binary semantic segmentation task. Individual pixels were dichotomously labelled as either informative cells, foreground, or non-informative background. Non-informative background was further constrained by including unidentifiable cells, such as debris or grossly out-of-focus particles. Binary annotation was initially performed using ilastik, an open-source software using a Random Forest classifier for pixel classification, then manually refined at the pixel level to ensure accurate semantic segmentation. This produced a binary mask in 1392 x 1040 .tif format for each corresponding raw colour image.&nbsp;</p> <p>Objects of interest were then manually labelled by two expert microscopists into one of seven clinically significant multi-class categories: rods, RBC/WBC, yeast, miscellaneous, single EPC, small EPC sheet, and large EPC sheet. This produced a multi-class mask in 1392 x 1040 .tif format with a label as pixel value from 0-7, where 0 is background (Table 1).&nbsp;</p> <p><strong>Data structure&nbsp;</strong></p> <p>The dataset is organised into three root folders: img (image), bin_mask (binary mask), and mult_mask (multi-class mask). Each folder has 300 files in .tif format and labelled with an incremental number.</p> <p><strong>Table1</strong></p> <pre><code class="language-markdown">Folder Files  Objects  Count Pixel Values img 300 Raw data 0-255 bin_mask  300 Background/Foreground 0/1 mult_mask  300 Background/Class 0 Rod 1697 1 RBC/WBC 1056 2 Yeast 41 3 Miscellaneous  550 4 Single EPC 182 5 Small EPC sheet 26 6 Large EPC sheet  10 7 Total 3562 </code></pre>

<p>泌尿系统感染(UTI)是一种常见的疾病。其诊断可通过对排尿尿液中感染细胞标志物的显微镜检查来完成。我们呈现了一个包含300张图像和3,562个手动标注的泌尿细胞数据集,这些细胞被分为七个具有临床意义的泌尿内容类别。这是一个经过优化的数据集,其中的样本来自有症状UTI患者的未染色和未经处理的尿液。该数据集的目的是通过利用先进的机器学习技术,结合简单的成像系统,在几乎所有临床环境中促进UTI的诊断。</p> <p><strong>数据采集</strong></p> <p>2022年4月至8月期间,从位于伦敦市中心的一家专业LUTS门诊诊所收集了300份有症状UTI患者的尿液样本。尿液样本以自然排尿的形式收集,并在现场一小时之内进行处理,以减轻细胞降解。使用Olympus BX41F显微镜框架、U-5RE五倍物镜、U-LS30 LED照明器和U-AC Abbe聚光镜进行了明场显微镜检查(Olympus PLCN20x Plan C N Achromat 20x/0.4倍物镜)。使用一次性血细胞计数板(C Chip™)由两位经验丰富的显微镜学家对每1 &micro;l尿液中的红细胞(RBC)、白细胞(WBC)、上皮细胞(EPC)以及其他细胞内容物进行计数。</p> <p>使用上述明场显微镜和0.5X C-mount适配器连接的数字彩色相机(Infinity 3S-1UR,Teledyne Lumenera)获取图像。图像以16位色彩在1392 x 1040 .tif格式下使用Capture and Analyse软件进行拍摄。采用优化的数据集方法,以最大化获取图像中的尿液细胞内容。此类数据整理对于克服类别不平衡也是必要的。每天进行Kohler照明和全局白平衡,以确保图像采集的一致性。</p> <p><strong>数据集标注</strong></p> <p>首先将感兴趣的细胞识别为二元语义分割任务,对300张图像进行了手动标注。单个像素被二分地标记为信息性细胞、前景或非信息性背景。非信息性背景进一步通过包括无法识别的细胞,如碎片或严重失焦的颗粒来限制。最初使用ilastik(一个使用随机森林分类器进行像素分类的开源软件)进行二元标注,然后在像素级别进行手动细化,以确保精确的语义分割。这为每个相应的原始彩色图像生成了一个1392 x 1040 .tif格式的二元掩码。</p> <p>然后由两位专家显微镜学家将感兴趣的物体手动标注为七个具有临床意义的多元类别之一:杆状物、红细胞/白细胞、酵母、杂项、单个上皮细胞、小上皮细胞片和大上皮细胞片。这生成了一个1392 x 1040 .tif格式的多元掩码,其中标签作为像素值从0到7,其中0为背景(表1)。</p> <p><strong>数据结构</strong></p> <p>该数据集组织为三个根文件夹:img(图像)、bin_mask(二元掩码)和mult_mask(多元掩码)。每个文件夹都有300个.tif格式的文件,并按递增编号进行标记。</p> <p><strong>表1</strong></p> <pre><code class="language-markdown">文件夹 文件数  对象  数量 像素值 img 300 原始数据 0-255 bin_mask  300 背景/前景 0/1 mult_mask  300 背景/类别 0 杆状物 1697 1 红细胞/白细胞 1056 2 酵母 41 3 杂项  550 4 单个上皮细胞 182 5 小上皮细胞片 26 6 大上皮细胞片  10 7 总计 3562 </code></pre>
提供机构:
rodare.hzdr.de
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作