Metaclusters by DPCfam clustering of UniRef50 v 2017_07
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5877585
下载链接
链接失效反馈官方服务:
资源简介:
Metaclusters obtained from the DPCfam clustering of UniRef50, v. 2017_07.
Metaclusters represent putative protein families automatically derived using the DPCfam method, as described in Unsupervised protein family classification by Density Peak clustering, Russo ET, 2020, PhD Thesis http://hdl.handle.net/20.500.11767/116345 . Supervisors: Alessandro Laio, Marco Punta.
Visit also https://dpcfam.areasciencepark.it/ to easily navigate the data.
VERSION 1.1 changes:
Added DPCfamB database, including all small metaclusters with 25<=N<50 seed sequences. DPCdamB files are named with the prefix B_
Added Alphafold representative based on AlphaFoldDB for each MC
FILES DESCRIPTION:
1) Standard DPCfam database
metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included.
metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .
metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .
all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm. A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported
uniref50_annotated.xml.gz UniRef50 v.2017_07 database annotated with Pfam families and DPCfam metaclusters. A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included. XML schema is derived from uniprot's UniRef50 xml schema.
2) DPCfamB database
B_metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. All metaclusters are listed. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included.
B_metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported .
B_metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported .
B_ all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm. A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported
创建时间:
2022-10-30



