five

Data_Sheet_2_Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking.pdf

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_2_Multi-Label_Random_Forest_Model_for_Tuberculosis_Drug_Resistance_Classification_and_Mutation_Ranking_pdf/12172506
下载链接
链接失效反馈
官方服务:
资源简介:
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.

耐药性预测与突变排序是结核分枝杆菌序列数据分析中的核心任务。鉴于一线抗结核药物的标准治疗方案的广泛应用,样本对多种药物产生耐药性的耐药共现现象十分普遍。因此,同时对所有药物进行分析,能够利用反映耐药共现的模式来辅助耐药性预测。本研究针对13402株结核分枝杆菌(Mycobacterium tuberculosis)分离株数据集,比较了多标签随机森林(multi-label random forest, MLRF)与单标签随机森林(single-label random forest, SLRF)模型在基于全基因组序列开展表型耐药预测,以及为优化四种一线抗结核药物耐药性预测所需关键突变识别两方面的表现。实验结果证实,相较于传统临床方法(性能提升18.10%)与单标签随机森林模型(性能提升0.91%),多标签随机森林模型可有效提升模型性能。此外,本研究筛选出一系列与耐药性预测密切相关,或与耐药共现现象存在关联的候选突变位点。进一步研究发现,仅基于排序靠前的突变子集重新开展分析,即可获得令人满意的模型性能。本研究的源代码可在http://www.robots.ox.ac.uk/~davidc/code.php获取。
创建时间:
2020-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作