CORGIAS experimental datasets
收藏Zenodo2025-11-04 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15496141
下载链接
链接失效反馈官方服务:
资源简介:
The dataset accompaies to the manuscript "CORGIAS: identifying correlated gene pairs by considering evolutionary history in a large-scale prokaryotic genome dataset"
The results in the manuscript can be reproduced by this dataset and the code provide here.
archaea, mycobacteriales and pseudomonadales zip include the following:
Input files for analysis:
COG_table.csv: A presence/absence table of all COGs found in either genomes in the datasets.
COG_table99.csv: A presence/absence tableof COGs shared in 1-99% genomes in the datasets.
hq_tree.tre: A phylogenetic tree used in the analyses.
COG_table4evoweaver.csv: Same as COG_table99.csv but for EvoWeaver.
hq_tree4evoweaver.tree: Same as hq_tree.tre but for EvoWeaver.
The results of phylogenetic profiling are as follows:
naive.csv
rle.csv
cwa.csv
cotr.csv
asa_*.csv
asawo.csv
sev.csv
evoweaver.csv
The prefix of each file name represents the phylogenetic profiling method used.ASA and SEV require ancestral state reconstruction (ASR), and * in their file names indicates how ASR was performed.asawo.csv is the result of ASA without considering branch length.evoweaver.csv is the result of four phylogenetic profiling methods implemented in EvoWeaver.
The results of statistical tests of above files are recorded in *_stat.csv except for evoweaver.csv, as it contains scores combining p-value.
The data for comparing phylogenetic methods as follows:
scaled_pvalues.csv: A table showing the minmax-scaled p-values for each method of CORGIAS and scores for EvoWeaver.
tp_pairs.txt: A list of true positive (functionally related) COG pairs detected by either method at TPR = 0.5
evolCCM.csv: Results from evolCCM for the COG pairs listed in tpr07_pairs.txt
tp_stat*.csv: A concatenated table combining evolCCM.csv, COG information, and presence/absence changes during evolution.* in tp_stat files indicates True positive rate (TPR) threshold.
ipynb.zip includes Jupyter notebook version of .py, which are deposited at https://github.com/ynishimuraLv/CORGIAS_data
COG.links.wo_cooccurence.txt includes STRING scores of the COG pairs recalculated without cooccurrence score. Original files is available at STRING database and recalculation can be reproduced in 1_Prepare_dataset.py.
Table S8-S10 included the pairs detected only by weighted methods with varying TPRs.
提供机构:
Zenodo
创建时间:
2025-11-04



