five

Multifaceted quality assessment of gene repertoire annotation with OMArk

收藏
Mendeley Data2024-05-17 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/10034236
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset associated to the OMArk paper. Contain eight archives: Supplementary_Tables The Supplementary Table files referred to in the paper OMAmerDB: The OMAmer database constructed using the whole dataset of the OMA database (November 2022 Release) and used in the paper. An OMAmer database is necessary to run OMArk. Simulation: Proteomes with artificially introduced errors, contaminants or depleted completeness, used to assess OMArk's performance. The archive contains the generated proteomes (Simulated_Data) and their OMArk quality assessments (omark). They also contains the OMAmer results (OMAmerResults) that were used to run OMArk and BUSCO completeness assessments (BUSCO). *Note that for storage efficiency, only the non-redundant part of the data (added errors, added contamination, random fraction of proteomes) are stored there. The full modified proteome can be regenerated from these data and the source proteomes. Reference Proteomes: The UniProt Reference Proteomes (Proteomes) (2021_04) and their proteome quality assesment results according to OMArk. The archive contains the source proteome FASTA (Source folder), OMAmer results for these proteomes (omamer folder) , OMArk results (omark folder), and BUSCO completeness assesments (BUSCO folder). It also contains a subfolder that contains part of the Contamination detection experiment (Contamination folder). Ensembl_Metazoa_AssemblyChange. Contains Ensembl Metazoa proteomes with version change between version 52 and 54 as well as their quality assesment resuls for both version. The archive contains the source proteomes FASTA (Source folder), a Splice file that group together all proteins coded by the same gene (Splice folder), omamer results for the proteomes (omamer folder) and the omark results (omark folder) MissingGenesBLAST Contains sequences of HOGs considered as missing in the Human proteome, that was used to look for sequences in the human genome. Ensembl_NCBI_Results Contains OMArk and BUSCO results for Ensembl and NCBI proteomes. These results were then used to evaluate OMArk biais due to source of proteomes in the OMA database. Notebooks Jupyter Notebooks that were used to perform the analysis described in the paper
创建时间:
2023-10-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作