Sars-Cov-2 and Mers sequences from human host with no unknown characters

DataONE2024-01-02 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:8c32e211e6e82b604ddf0fe2ebbb2a8c46ea40a22b6639467750fa805518e953

下载链接

链接失效反馈

官方服务：

资源简介：

The datasets are organized as follows: first column, number of bases in a given sequence; second, third, fourth and fifth columns, number of bases of type A, C, G and T, respectively, in the same sequence.Â 1) Sars-Cov-2 dataset.Â This dataset contains number of bases for the complete genome sequences from a human host, with none unknown characters.Â In the NCBI database, there are about 950.000 sequences with these characteristics. 2) Restricted Sars-Cov-2 dataset: This dataset contains number of bases for the complete sequences from a human host, with no unknown characters, with 29903 bases, that is of the same length as the reference sequence NC045512.2. We obtained, from the NCBI database, about 5600 sequences with such features. 3) Mers dataset: This dataset contains number of bases for the complete sequences of about 200 complete genome sequences from a human host, with no unknown characters., Raw datasets are genome sequences retrieved from the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov). The sequences were filtered according to the following criteria: 1) Sars-Cov-2 dataset. This dataset containsÂ complete genome sequences from a human host, with none unknown characters.Â In the NCBI database, there are about 950.000 sequences with these characteristics. 2) Restricted Sars-Cov-2 dataset: This dataset contains complete genome sequences from a human host, with no unknown characters, with 29903 bases, that is of the same length as the reference sequence NC045512.2. We obtained, from the NCBI database, about 5600 sequences with such features. 3) Mers dataset: We selected about 200 complete genome sequences from a human host, with no unknown characters. Raw data have been processed through a C++ code (provided with the datsets) that reads a dataset of nucleic acid sequences in FASTA format and returns the number of bases in each seq..., , # Sars-Cov-2 and Mers sequences from human host with none unknown characters We uploaded three datasets organized as follows: first column, number of bases in a given sequence; second, third, fourth and fifth columns, number of bases of type A, C, G and T, respectively, in the same sequence.Â **1) Sars-Cov-2 dataset.**Â This dataset contains number of bases for the complete genome sequences from a human host, with none unknown characters.Â In the NCBI database, there are about 950.000 sequences with these characteristics. **2) Restricted Sars-Cov-2 dataset:** This dataset contains number of bases for the complete sequences from a human host, with no unknown characters, with 29903 bases, that is of the same length as the reference sequence NC045512.2. We obtained, from the NCBI database, about 5800 sequences with such features. **3) Mers dataset:** This dataset contains number of bases for the complete sequences of about 250 complete genome sequences from a human host, with no unknown ...

创建时间：

2024-01-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集