Data and scripts for Bayesian prediction of microbial oxygen requirement of selected bacteria from the NCBI genome database
收藏f1000.figshare.com2023-05-31 更新2025-03-25 收录
下载链接:
https://f1000.figshare.com/articles/dataset/Data_and_scripts_for_Bayesian_prediction_of_microbial_oxygen_requirement_of_selected_bacteria_from_the_NCBI_genome_database/783889/1
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1: Text-format (.txt). One-step prediction results. The classification predictions for all included genomes when using the one-step method and N-fold cross-validation.
Additional file 2: Text-format (.txt). Two-step prediction results. The classification predictions for all included genomes when using the two-step method and N-fold cross-validation.
Additional file 3: Text-format (.txt). Domains distinguishing anaerobes from respiring bacteria. The Pfam-A domains found significantly more frequently in bacteria which are capable of respiration (aerobes/facultative anaerobes) than in anaerobes, and vice versa.
Additional file 4 Text-format (.txt). Domains distinguishing aerobes and facultative anaerobes. The Pfam-A domains found significantly more frequently in anaerobe than facultative bacteria, and vice versa.
Additional file 5: Text-format (.txt). Predictions of aerobes and anaerobes only. The predictions of all of the included genomes which are either aerobe or anaerobe, excluding the facultative anaerobes. Made only to allow for direct comparison to results in the literature.
Additional file 6. Text-format (.txt). Protein domain presence/absence matrix. The presence/absence profiles with respect to Pfam-A domains of all included genomes.
Additional file 7: Python-format (.py). Get likelihoods from Pfam-domains. A python script used to identify the Pfam-A domains over-represented in one class compared to the others based on the training set, and on that basis construct the likelihood files used for predictions.
Additional file 8: Python-format (.py). Predictor. A python script used to predict the oxygen requirements classification of genomes in the test set, based on protein domain profile and the likelihood files created by Additional file 7.
Additional file 9: Python-format (.py). Predictive evaluations. A python script used to evaluate the predictors performance by calculating a Matthew's Correlation Coefficient for each of the classifications in the predictions made by Additional file 8.
附加文件1:文本格式(.txt)。一步预测结果。使用一步预测方法和N折交叉验证对所有包含的基因组进行的分类预测。
附加文件2:文本格式(.txt)。两步预测结果。使用两步预测方法和N折交叉验证对所有包含的基因组进行的分类预测。
附加文件3:文本格式(.txt)。区分厌氧菌与需氧菌的域。Pfam-A域在具有呼吸能力(需氧菌/兼性厌氧菌)的细菌中比在厌氧菌中发现频率显著更高,反之亦然。
附加文件4:文本格式(.txt)。区分需氧菌与兼性厌氧菌的域。Pfam-A域在厌氧菌中比兼性细菌发现频率显著更高,反之亦然。
附加文件5:文本格式(.txt)。仅预测需氧菌和厌氧菌。仅对包括的基因组中属于需氧菌或厌氧菌的预测进行,排除兼性厌氧菌,以便直接比较文献中的结果。
附加文件6:文本格式(.txt)。蛋白质域存在/不存在矩阵。所有包含的基因组相对于Pfam-A域的存在/不存在轮廓。
附加文件7:Python格式(.py)。从Pfam域获取似然度。一个Python脚本,用于根据训练集识别相对于其他类别在某一类中过度表达的Pfam-A域,并据此构建用于预测的似然度文件。
附加文件8:Python格式(.py)。预测器。一个Python脚本,用于根据蛋白质域轮廓和由附加文件7创建的似然度文件预测测试集中基因组的氧气需求分类。
附加文件9:Python格式(.py)。预测性评估。一个Python脚本,用于通过计算每个分类的Matthews相关系数来评估预测器的性能,这些分类是由附加文件8做出的预测。
提供机构:
f1000.figshare.com



