Circulating Cell-Free RNA in Blood as a Host Response Biomarker for the Detection of Tuberculosis [training_data]
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE255071
下载链接
链接失效反馈官方服务:
资源简介:
Tuberculosis (TB) remains a leading cause of death from an infectious disease worldwide. This is partly due to a lack of tools to effectively screen and triage individuals with potential TB. Whole blood RNA signatures have been extensively studied as potential biomarkers for TB, but they have failed to meet the World Health Organization's (WHOs) optimal target product profiles (TPPs) for a non-sputum triage or diagnostic test. In this study, we investigated the utility of plasma cell-free RNA (cfRNA) as a host response biomarker for TB. We used RNA profiling by sequencing to analyze plasma samples from individuals with a cough lasting at least two weeks, who were seen at outpatient TB clinics in Uganda, Vietnam, and the Philippines. We split the samples into a discovery set for model training and testing, and reserved a validation set from a separate cohort to validate the model performance. We trained 15 machine learning classification models and developed a 6-gene signature that has a high performance in differentiating TB positive and negative individuals (Area Under the Curve, AUC = 0.95, 0.92, 0.95 for the test, training and validation sets respectively). This 6-gene signature exceeds the optimal WHO TPPs for a TB triage test (sensitivity: 97.1% [95% CI: 80.9-100%], specificity: 85.2% [95% CI: 72.4-100%]) and was robust to differences in geographic location, sample collection, and HIV status. Analysis of matched whole blood samples from the validation cohort highlighted the differences in origin of plasma and whole blood RNA. Overall, our results demonstrate the utility of plasma cfRNA for the detection of TB and suggest the potential for a point-of-care, gene expression-based assay to aid in early detection of TB. In the study presented here 251 patients who presented with a cough >= 2 weeks were identified in three separate cohorts, across three different countries (Uganda, Vietnam, Philippines). Patients were identified as TB positive or negative at clinics (tb status indicated in the “tb” column of the metadata). We extracted the cell-free RNA from the plasma of patients and conducted RNA sequencing. We split these samples into discovery (cohort 1 and 2) and validation (cohort 3). And then further split the discovery into 70% for machine learning model training and 30% for model testing. The validation cohort was evaluated independently. The data shown here are the gene transcript counts from the training data samples. Please note that the records have been updated with raw sequencing data on Mar 27, 2024.
结核病(Tuberculosis, TB)仍是全球范围内引发感染性疾病死亡的首要病因之一,这在一定程度上源于缺乏可有效开展筛查与分流潜在结核病感染者的工具。全血RNA特征作为结核病潜在生物标志物已被广泛研究,但始终未能达到世界卫生组织(World Health Organization, WHO)针对非痰液标本分流分诊或诊断检测的最优目标产品概况(Target Product Profiles, TPP)要求。
本研究探讨了血浆无细胞RNA(plasma cell-free RNA, cfRNA)作为结核病宿主应答生物标志物的应用价值。研究人员对在乌干达、越南及菲律宾结核病门诊就诊的咳嗽症状持续至少2周的个体的血浆样本进行了RNA测序转录组分析。
我们将样本划分为用于模型训练与测试的发现集,并预留了来自独立队列的验证集以验证模型性能。本研究共训练了15种机器学习分类模型,最终开发出一套6基因特征,其在区分结核病阳性与阴性个体时表现优异:测试集、训练集及验证集的受试者工作特征曲线下面积(Area Under the Curve, AUC)分别为0.95、0.92及0.95。
该6基因特征超越了WHO针对结核病分流分诊检测的最优TPP要求(灵敏度:97.1% [95%置信区间(Confidence Interval, CI):80.9%~100%],特异度:85.2% [95%置信区间(Confidence Interval, CI):72.4%~100%]),且不受地理区域、样本采集方式及人类免疫缺陷病毒(Human Immunodeficiency Virus, HIV)感染状态的影响,表现出良好的稳健性。
对验证队列中匹配的全血样本进行分析后,揭示了血浆与全血RNA的来源差异。总体而言,本研究结果证实了血浆cfRNA用于结核病检测的应用价值,并提示基于基因表达的即时检验(point-of-care)检测手段有望助力结核病的早期筛查。
本研究共纳入来自乌干达、越南、菲律宾3个不同国家的3个独立队列中251名咳嗽时长≥2周的患者。临床医师已通过门诊检查确定了患者的结核病感染状态(元数据(metadata)中的"tb"列标注了患者的结核病状态)。研究人员从患者血浆中提取无细胞RNA并开展了RNA测序。
我们将上述样本划分为发现集(队列1与队列2)与验证集(队列3),并进一步将发现集按7:3的比例划分为模型训练子集(70%)与模型测试子集(30%)。验证集则被用于独立评估模型性能。
本数据集展示的为训练数据样本的基因转录本计数数据。请注意,本数据集的记录已于2024年3月27日通过原始测序数据完成更新。
创建时间:
2024-06-28



