five

CoMR (Comprehensive Mitochondrial proteome Reconstruction) reference databases,benchmarking data, and container for mitochondrial proteome reconstruction

收藏
DataCite Commons2026-04-21 更新2026-05-05 收录
下载链接:
https://figshare.scilifelab.se/articles/dataset/CoMR_Comprehensive_Mitochondrial_proteome_Reconstruction_reference_databases_benchmarking_data_and_container_for_mitochondrial_proteome_reconstruction/31361839
下载链接
链接失效反馈
官方服务:
资源简介:
<pre>This item contains reference databases, benchmarking resources, and a<br>reproducibility container associated with CoMR (Comprehensive Mitochondrial proteome Reconstruction),an integrative workflow for reconstructing mitochondrial proteomes from eukaryotic protein sequence data.<br><br>Mitochondrial proteome reconstruction often relies heavily on prediction of mitochondrial targeting signals (MTSs), but MTS predictors are mainly trained on model organisms and may perform poorly in phylogenetically divergent lineages or in organisms with atypical or reduced targeting sequences. CoMR was developed to address this by integrating complementary evidence sources within a unified scoring framework, including targeting prediction, curated homology searches, large-scale similarity searches, profile HMM detection, and automated phylogenetic analysis. <br>The workflow is implemented as a modular **Snakemake-based pipeline** and is<br>distributed in containerized form to support reproducible execution across<br>computing environments.<br><br>The files deposited here support inspection, reuse, and reproducibility of that workflow. They include: (1) CoMR databases with FASTA databases, preformatted BLAST resources, orthogroup alignment archives, and HMM profile archives; (2) a benchmarking collection with filtered FASTA and DIAMOND databases, benchmark proteomes, benchmarking scripts, summary tables, figures, and benchmarking outputs; and (3) a Singularity/Apptainer container image (CoMR.sif) for running CoMR in a controlled computational environment.<br><br>The benchmarking material corresponds to the analyses described in the paper for the model yeast <i>*Saccharomyces cerevisiae*</i> and the divergent anaerobic protist <i>*Paratrimastix pyriformis*</i>. In the manuscript, CoMR achieved strong<br>discriminatory performance in yeast (ROC-AUC 0.92), exceeding standalone<br>TargetP2 prediction (ROC-AUC 0.72), and maintained robust performance in<br><i>*P. pyriformis*</i> (ROC-AUC 0.86), where precision-recall analysis also supported<br>improved recovery of mitochondrial-related organelle proteins relative to<br>TargetP2. The benchmarking resources in this deposit include the processed data,scripts, figures, and output archives underlying those comparisons.<br><br>The deposited reference resources include the **CoMR Subtractive Mitochondrial<br>Database (SMD)**, supporting HMM resources, and benchmarking-modified database<br>versions generated for performance evaluation with taxonomic exclusion to reduce circularity. The benchmarking directory also documents how filtered databases and orthogroup alignments were generated, and how benchmarking tables, ROC curves, and precision-recall summaries were generated from CoMR output tables. <br><br>The accompanying README and MANIFEST files provide a self-contained guide to the files and an inventory of the distributed content.</pre>
提供机构:
Lund University
创建时间:
2026-02-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作