Curated MERS-CoV Spike Glycoprotein Sequences from Human and Camel Isolates (2012–2015) with Receptor-Binding Motif Annotations and Quality-Controlled Alignments
收藏Figshare2026-01-22 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_Curated_MERS-CoV_Spike_Glycoprotein_Sequences_from_Human_and_Camel_Isolates_2012_2015_with_Receptor-Binding_Motif_Annotations_and_Quality-Controlled_Alignments_b_/31126789
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises curated Middle East respiratory syndrome coronavirus (MERS-CoV) Spike (S) glycoprotein sequences derived from human and camel isolates, along with associated multiple sequence alignments and quality control metrics. The work was conducted to support comparative genomic analysis of the Spike protein across primary host species involved in MERS-CoV transmission.Data Curation and Selection CriteriaA starting set of 1,566 MERS-CoV genome sequences with explicit host annotations was obtained from public repositories. From this, only sequences with explicitly annotated Spike (S) coding regions defined by the presence of the gene symbol “S” in the protein annotation field were retained. This stringent criterion yielded a high-confidence subset of 9 human-derived and 9 camel-derived Spike protein sequences. All selected sequences correspond to full-length, high-quality genomes with unambiguous host metadata (e.g., Homo sapiens, Camelus dromedarius) and collection dates spanning 2012-2015. Sequences derived from partial genomes or lacking clear host attribution were excluded.Sequence Processing and AlignmentThe selected Spike protein sequences were subjected to multiple sequence alignment using the MAFFT algorithm (v7.520) via the official web server (https://mafft.cbrc.jp/alignment/server/), employing default auto settings. Alignments were performed separately for human and camel groups to preserve within-host homology structure. The resulting alignments were validated for sequence integrity, gap distribution, and residue ambiguity.Receptor-Binding Motif (RBM) ExtractionThe receptor-binding motif (RBM), defined as residues 484-567 of the reference MERS-CoV Spike protein (NC_038294.1), was extracted directly from the unaligned full-length Spike sequences. This approach ensured that RBM sequences reflected native residue composition without interpolation or positional artifacts introduced by global alignment.Quality Control and ValidationAll sequences and alignments underwent comprehensive quality control, including:Verification of accession traceability to NCBI recordsAssessment of gap content and alignment length consistencyScreening for ambiguous amino acid residues (e.g., X, B, Z, *)Calculation of site-wise conservation metricsObservationsThe aligned human and camel Spike protein datasets each contain 9 sequences of uniform length (333 amino acids post-alignment). The RBM subregion (84 amino acids) shows high sequence identity across all isolates, with minimal variation observed between host groups. The dataset reflects the natural genetic diversity of MERS-CoV Spike as sampled from documented human cases and camel reservoirs during the 2012-2015 period.ApplicabilityThis dataset is suitable for:Comparative structural modeling of MERS-CoV SpikeEpitope mapping and antigen design studiesHost-specific evolutionary analyses of betacoronavirusesBenchmarking of alignment or conservation analysis toolsAll sequences are provided in standard FASTA format, with headers preserving original NCBI accession identifiers and host metadata. Supporting quality control metrics are included as structured tabular files.
创建时间:
2026-01-22



