Data files and code on the comparison of SARS-CoV-2 with non-segmented RNA viruses
收藏Mendeley Data2024-06-25 更新2024-06-28 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Data_files_and_code_on_the_comparison_of_SARS-CoV-2_with_non-segmented_RNA_viruses/12482813
下载链接
链接失效反馈官方服务:
资源简介:
This fileset contains 15 data files and 1 ReadMe file.The data files are as follows: Five Results files in .fasta file format. These are: Result_MacroDomain_.fasta, Result_Spike1_.fasta, Result_Spike2_.fasta, Result_Spike2protein_.fasta and Result_Viroporin_.fasta. Two power point presentations (.pptx file format). These are: Analysis by MegaX_.pptx and Open Reading Frames_Conserved Domain Found in ORF and CDD_Children_.pptx. Three data files in .nwk file format. These are: NeurotropicRNA.noSegmentR1_original tree.nwk, MacroDomainGen_BootstrapTree.nwk and ViroporinGen_BootstrapTree.nwk. One code file in .R file format. This is: Protein Alignment RStudio_msa package_.R. One file in .tex file format. This is: Covid19_.tex. Two files in .txt file format. These are: Covid19_.txt and texshade.sty package_.txt. One file in .sty file format. This is: texshade_.sty. The 5 fasta files contain the results of the multiple protein sequence alignment.The power point presentations Open Reading Frames_Conserved Domain Found in ORF and CDD_Children.pptx contains the search results (snapshot figures) obtained by the Open Reading Frame (ORF) finder and Conserved Domains Database (CDD) database (NCBI). This file provides evidence to show how Figure2 and 3 were made.The power point presentation Analysis by MegaX_.pptx contains the evidence (parameters) to show how the sequences were aligned and how the tree files were made in Figure 1 by MegaX software.The three .nwk files (in Newick tree format) were produced using the MEGAX software. These files contain the data used to construct the phylogenetic trees shown in figures 1, 2B and 2C of the article.The R file contains all the codes required to produce figures 2 and 3 in the article.The Covid19.tex file works together with R Studio, containing the msa package (an R package for Multiple Sequence Alignment) to make Figures 2 and 3 in the article. The sty file is a system file for LaTex and contains codes. Study aims and methodology: The primary objective of the current study was to determine the possible evolutionary and molecular relationships between SARS-CoV-2 and non-segmented RNA viruses, especially the viruses that can infect the nervous system in infants and children.The whole-genome sequences of 35 non-segmented RNA viruses including 13 CoVs were retrieved from the National Center for Biotechnology Information (NCBI), for the purpose of phylogenetic analysis, which was conducted with MEGAX (Penn State University, PA, USA). All genomic sequences were aligned with the ClustalW algorithm and phylogenetic prediction inferred by the maximum likelihood method and Tamura-Nei model. RStudio (RStudio, Inc., Boston, MA, USA) with msa package was used for multiple protein sequence alignment. For more details on the methodology, please read the related article.
本数据集包含15个数据文件与1个ReadMe文件。数据文件明细如下:
1. 5个FASTA格式结果文件,分别为Result_MacroDomain_.fasta、Result_Spike1_.fasta、Result_Spike2_.fasta、Result_Spike2protein_.fasta及Result_Viroporin_.fasta;
2. 2个PowerPoint演示文稿(.pptx格式),分别为Analysis by MegaX_.pptx与Open Reading Frames_Conserved Domain Found in ORF and CDD_Children_.pptx;
3. 3个Newick树格式(.nwk)数据文件,分别为NeurotropicRNA.noSegmentR1_original tree.nwk、MacroDomainGen_BootstrapTree.nwk及ViroporinGen_BootstrapTree.nwk;
4. 1个R脚本文件(.R格式),即Protein Alignment RStudio_msa package_.R;
5. 1个LaTeX源文件(.tex格式),即Covid19_.tex;
6. 2个文本文件(.txt格式),分别为Covid19_.txt与texshade.sty package_.txt;
7. 1个LaTeX样式文件(.sty格式),即texshade_.sty。
上述5个FASTA文件存储了蛋白质多序列比对(Multiple Sequence Alignment)的结果。其中,Open Reading Frames_Conserved Domain Found in ORF and CDD_Children_.pptx包含由开放阅读框(Open Reading Frame, ORF)查找工具与保守结构域数据库(Conserved Domains Database, CDD,隶属于美国国家生物技术信息中心National Center for Biotechnology Information, NCBI)检索得到的结果截图,该文件可佐证本文图2与图3的制作流程。Analysis by MegaX_.pptx则收录了相关参数信息,用于说明如何通过MEGA X软件完成序列比对,并制作本文图1中的进化树文件。
3个.nwk格式的Newick树文件均由MEGA X软件生成,存储了用于构建本文图1、图2B及图2C中进化树的核心数据。该R脚本文件包含了复现本文图2与图3所需的全部代码。Covid19_.tex需配合RStudio集成开发环境使用,结合多序列比对R包(msa package)完成本文图2与图3的制作。sty格式文件为LaTeX系统配置文件,包含相关功能代码。
研究目标与方法:本研究的核心目标为解析严重急性呼吸综合征冠状病毒2(SARS-CoV-2)与非节段RNA病毒,尤其是可感染婴幼儿神经系统的病毒之间潜在的进化与分子关联。研究团队从美国国家生物技术信息中心(NCBI)检索得到35株非节段RNA病毒的全基因组序列,其中包含13株冠状病毒(CoVs),并利用美国宾夕法尼亚州立大学开发的MEGA X软件开展进化分析。所有基因组序列均通过ClustalW算法完成序列比对,进化预测采用最大似然法与Tamura-Nei模型进行推断。此外,研究团队使用搭载msa扩展包的RStudio集成开发环境完成蛋白质多序列比对。如需了解方法学的更多细节,请查阅相关研究论文。
创建时间:
2023-06-28



