Sars-CoV-2 structures -- sequence-to-alignments derived from PDB and from PSSH2, plus dark regions
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4934860
下载链接
链接失效反馈官方服务:
资源简介:
Aquaria Coverage map
In "SARS-CoV-2 structural coverage map reveals viral protein interactions, hijacking, and mimicry" we introduce a novel concept to visually organize a complex dataset of a large numbers of models: a one-stop visualization summarizing what is known - and not known - about the 3D structure of the viral proteome. This tailored visualization — called the SARS-CoV-2 structural coverage map — helps researchers find structural models related to specific research questions and can be viewed in the Aquaria-COVID resource.
Aquaria_COVID_Coverage_Map.csv summarises the most basic information of this map, specifying dark and non-dark regions, as well as number of residues predicted to be disordered in these regions - predicted by Meta-Disorder (Schlessinger et al, 2006).
The PSSH2 data set
The sequence-to-structure alignments were generated using a modified version of the Aquaria sequence-to-structure processing pipeline (O’Donoghue et al, 2015), making up a subset of the PSSH2 database.
PSSH2 is a database of protein sequence-to-structure homologies based on HHblits, an alignment method employing iterative comparisons of hidden Markov models (HMMs). To ensure the highest possible final alignment quality for matches in Aquaria using HHblits, we first calculate HMM profiles for each unique PDB sequence (PDB_full) and also for each unique Swiss-Prot sequence. We generated PSSH2 using HHblits to find similarities between HMMs from PDB and HMMs from UniProt sequences.
seq_to_struc_alignemnts_PSSH2.csv.gz contains a subset of the usual PSSH2 database, including only the proteins relevant to visualise Sars-CoV-2 structures. Protein sequences and PDB structures are identified by the MD5 hashes of their respective sequences.
PDB_chain_identifier_mappings.csv and swissprot_identifier_mappings.csv detail which entries in Swissprot and PDB chain are referred to by the MD5 hashes in the PSSH2 data set.
Calculating PSSH2
The main bunch of Swissprot and PDB data was downloaded in October 2020, but incremental updates, especially as related to Covid19 were added until April 2021.
Generating PSSH2: We used Uniclust30 from HH-suite, a database of non-redundant UniProt sequence clusters in which the highest pairwise sequence identity between clusters was 30% (http://gwdu111.gwdg.de/~compbiol/uniclust/2020_03/UniRef30_2020_03_hhsuite.tar.gz). The HHblits code and the code for running the calculations was retrieved from git (https://github.com/soedinglab/hh-suite.git and https://github.com/aschafu/PSSH2.git respectively) at the respective time of calculation in the timeframe until April 2021.
PDB based sequence-to-structure alignments
In addition to the PSSH2 data, new PDB structures were retrieved based on the primary accession of the proteins, by querying for all chains in all PDB entries with exact matches using the sequence cross references records given in PDB. Sequence-to-structure alignments were then created, again based on information provided in each PDB entry. These alignments are summarised in PDB_chain_alignments_pssh2Format.csv.
创建时间:
2021-06-15



