five

Supplementary Information for "AI-assisted structural consensus-proteome prediction of human monkeypox viruses isolated within a year after the 2022 multi-country outbreak" containing Data S1-S5 as well as 3D structural models of ORF ID_5754 and the MPX consensus proteome (April2022 to April2023)

收藏
DataCite Commons2023-08-25 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/3D_structural_models_of_MPX_consensus_proteins_April2022_to_April2023/22730459/4
下载链接
链接失效反馈
官方服务:
资源简介:
The monkeypox virus (MPX) belongs to the orthopoxvirus genus of the <em>Poxviridae</em> family, is endemic in parts of Africa, and causes a disease in humans similar to smallpox. The most recent outbreak of MPX is already affecting 110 countries with 86,956 confirmed cases since May 2022 and has consequently become a focus of interest. In particular, a molecular understanding of the virus is essential to study infection processes and pathogen-host interactions, predict tropism changes, or guide drug-development and -discovery as well as vaccine-development or -adaptation at a very early stage. Here we provide the structural proteome of the currently circulating MPX virus: We computed a consensus genome sequence from 3,713 viral isolates sampled during one year after the outbreak and predicted the structure of 210 proteins contained in this consensus genome with AlphaFold2 and ESMFold, as well as, for 148 proteins, which matched to proteins deposited in the PDB, with homology modeling. In total, 568 distinctive structural models are contained within the folder provided here. <br> <br> Data S1 | The consensus genome sequence generated from 3,713 monkeypox genome sequences, which were sampled within one year after the 2022 MPX outbreak, in FASTA format (DataS1_consensus_genome.fasta). <br> Data S2 | A csv file containing the composition and conservation data of the consensus genome sequence (DataS2_conservation_consensus_genome.csv). <br> Data S3 | A set of proteins which serve as the reference proteome, consisting of 179 proteins contained in the NCBI Reference Sequence NC_063383.1, extended with additional 32 proteins contained in MN648051.1 (DataS3_NCBI_reference_proteome.fasta). <br> Data S4 | This dataset contains the protein sequences of 10,580 characteristic candidate ORFs, their position in the genome sequence, BLASTP results of searches in standard databases, topology predictions, modeling information, as well as mutation analyses (DataS4_consensus_proteome_analyses.csv). <br> Data S5 | An acknowledgment table including accession numbers and the origin of the processed sequences (DataS5_GISAID_acknowledgement_table.pdf).
提供机构:
figshare
创建时间:
2023-06-16
二维码
社区交流群
二维码
科研交流群
商业服务