five

Unified Human Gastrointestinal Proteome clustering results by DPCfam

收藏
Mendeley Data2024-06-27 更新2024-06-27 收录
下载链接:
https://zenodo.org10523297
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the result of clustering the Unified Human Gastrointestinal Proteome using the DPCfam algorithm. More details on the DPCfam clustering algorithm can be found in the original publication: Russo, Elena Tea, et al. "DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets." PLOS Computational Biology 18.10 (2022): e1010610. https://doi.org/10.1371/journal.pcbi.1010610 All of the putative protein families obtained through DPCfam (including previous results) can be browsed online at our dedicated webserver: https://dpcfam.areasciencepark.it/uhgp The original protein dataset is version 1.0 of the UHGP-50 dataset, available for download from MGnify at https://www.ebi.ac.uk/metagenomics/. FILES DESCRIPTION: Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 aminoacids are reported. metaclusters_xml.tar.gz: dpcfam_uhgp_metaclusters.xml: Metaclusters' seeds. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction, etc.) and Pfam comparison (Dominant Architecture). dpcfam_metaclusters.xsd: XML schema file for the data. MCxml_to_tables.awk: Awk script to convert from XML to tabular text files. Use through the parse.sh script. parse.sh: XML parser. README.md uhgp_xml.tar.gz: uhgp_seed_match.xml: XML file containing all of UHGP-50 proteins and its corresponding sequences, annotated with Pfam and DPCfam metacluster data. Annotations comprise the membership of a protein as a seed or matches found though the profile-hmms of the DPCfam-UHGP and the DPCfam-Uniref clusterings. uhgp_matches.xsd: XML schema file for the data. xml_to_list.awk: Awk script to convert from XML to tabular text files. Use through the parse.sh script. xml_to_list_mcfiles.awk: Awk script to convert from XML to tabular text files (including individual files for metaclusters' seeds). Use through the parse.sh script. parse.sh: XML parser. README.md Metacluster Files: seeds.zip: Metaclusters' seed sequences. A fasta file for each metacluster before filtering. filtered_seeds.zip: Metaclusters' seed sequences after clustering at 60 percent identity. metaclusters_hmms.tar.gz: Metaclusters' profile-hmms. A ".hmm" file for each metacluser. metaclusters_msas.tar.gz: Metaclusters' multiple sequence alignments, in fasta format.
创建时间:
2024-01-19
二维码
社区交流群
二维码
科研交流群
商业服务