Gene classifications for E. coli MG1655.
收藏Figshare2026-03-06 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_p_Gene_classifications_for_E_coli_MG1655_p_/31561905
下载链接
链接失效反馈官方服务:
资源简介:
The identification of essential genes in Transposon Directed Insertion Site Sequencing (TraDIS) data relies on the assumption that transposon insertions occur randomly in non-essential regions, leaving essential genes largely insertion-free. While intragenic insertion-free sequences have been considered as a reliable indicator for gene essentiality, so far, no exact probability distribution for these sequences has been proposed. Further, many methods require setting thresholds or parameter values a priori without providing any statistical basis, limiting the comparability of results. Here, we introduce Consecutive Non-Insertion Sites (ConNIS), a novel method for gene essentiality determination. ConNIS provides an analytic solution for the probability of observing insertion-free sequences within genes of given length and considers variation in insertion density across the genome. Based on an extensive simulation study and different real-world scenarios, ConNIS was found to be superior to prevalent state-of-the-art methods, particularly when libraries had only a low or medium insertion density. In addition, our results showed that the precision of existing methods can be improved by incorporating a simple weighting factor for the genome-wide insertion density. To set methodically embedded parameter and threshold values of TraDIS methods a subsample-based instability criterion was developed. Application of this criterion in real and synthetic data settings demonstrated its effectiveness in selecting well-suited parameter/threshold values across methods. An R package and an interactive web application are provided to facilitate application and reproducibility.
基于转座子定向插入位点测序(Transposon Directed Insertion Site Sequencing, TraDIS)数据鉴定必需基因(essential gene),其核心假设为转座子插入(transposon insertion)在非必需区域呈随机分布,致使必需基因整体上无插入事件发生。尽管基因内无插入序列(intragenic insertion-free sequences)已被视为判定基因必需性(gene essentiality)的可靠指标,但截至目前,尚未有针对此类序列的精准概率分布模型被提出。此外,多数方法需预先设定阈值或参数值,却未提供任何统计学依据,这限制了不同研究结果间的可比性。
本文提出一种全新的基因必需性(gene essentiality)判定方法——连续无插入位点(Consecutive Non-Insertion Sites, ConNIS)。该方法为给定长度基因内出现无插入序列的概率提供了解析解,并考虑了全基因组范围内插入密度(insertion density)的异质性。通过大规模模拟研究与多种真实场景验证,ConNIS的性能优于当前主流的前沿方法,尤其在插入密度较低或中等的测序文库中优势更为显著。此外,研究结果表明,通过引入针对全基因组插入密度的简单权重因子,可进一步提升现有方法的预测精度。
为系统性设定TraDIS相关方法的内嵌参数与阈值,本文开发了一种基于子采样的不稳定性准则。在真实数据与合成数据(synthetic data)场景下应用该准则,证实其可有效为不同方法选取适配的参数/阈值。为方便方法应用与结果可重复性(reproducibility),本文还提供了配套的R包(R package)与交互式网页应用(interactive web application)。
创建时间:
2026-03-06



