five

Whole genome sequencing of in vitro propagated SARS-CoV-2 at the Karolinska Institutet BSL-3 Core Facility

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4722501
下载链接
链接失效反馈
官方服务:
资源简介:
SARS-CoV-2 RNA var. UK B117 210201 #1 Information This dataset contains raw and processed sequencing data for SARS-CoV-2 strains obtained from the Public Health Agency of Sweden (Folkhälsomyndigheten). These include a UK reference sample (B.1.1.7: “SARS-CoV-2 RNA var. UK B117 210201”) and a Wuhan variant (B) that was sampled in Sweden during spring 2020 and in vitro propagated at the Karolinska Institutet BSL-3 core facility (see (1) for more details). Methods Libraries were constructed using EasySeq™ RC-PCR SARS CoV-2 Whole Genome Sequencing kit (Nimagen, SKU: RC-COV096) and sequenced on an Illumina NextSeq550. Raw data was extracted, and adapter trimmed using bcl2fastq v2.20.0.442 [--no-lane-splitting] then quality trimmed using fastp (2) v0.20.0 [default settings] and aligned against the SARS-CoV-2 Wuhan-Hu-1 reference sequence [ASM985889v3/GCA_009858895.3/MN908947.3] using bowtie2 (3) v2.4.1 [--very-sensitive -N 1]. Genotype likelihoods were calculated using bcftools (4) v1.10.2 [mpileup  -C 50 -d 1000 -E] and variants were called using bcftools (4) v1.10.2 [call -mv --ploidy 1]. Resulting SNPs were filtered based on quality and excluded if they were within 5 bp from indels using bcftools v1.10.2 [filter -g 5 -I ‘QUAL>30 %% TYPE=”snp”’]. Variant effects were predicted using the Ensembl VEP tool (5) v102.0 [--offline –symbol –hgvs –distance 0 -custom {blacklist},,vcf,exact,0,] and marked for problematic sites [{blacklist} = https://raw.githubusercontent.com/W-L/ProblematicSites_SARS-CoV2/master/problematic_sites_sarsCov2.vcf, (for more details, see: https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473)]. Next, problematic sites were excluded, and consensus genome sequences were constructed per sample by inserting variants into the reference genome [bcftools consensus]. SARS-CoV-2 lineage was confirmed using Pangolin (6) v2.2.1 [lineages version 2021-02-06]. References 1. I. Smyrlaki, M. Ekman, A. Lentini, N. Rufino de Sousa, N. Papanicolaou, M. Vondracek, J. Aarum, H. Safari, S. Muradrasoli, A. G. Rothfuchs, J. Albert, B. Högberg, B. Reinius, Massive and rapid COVID-19 testing is feasible by extraction-free SARS-CoV-2 RT-PCR. Nat. Commun. 11, 4812 (2020). 2. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, i884–i890 (2018). 3. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–359 (2012). 4. P. Danecek, J. K. Bonfield, J. Liddle, J. Marshall, V. Ohan, M. O. Pollard, A. Whitwham, T. Keane, S. A. McCarthy, R. M. Davies, H. Li, Twelve years of SAMtools and BCFtools. GigaScience. 10, giab008 (2021). 5. W. McLaren, L. Gil, S. E. Hunt, H. S. Riat, G. R. S. Ritchie, A. Thormann, P. Flicek, F. Cunningham, The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). 6. A. Rambaut, E. C. Holmes, Á. O’Toole, V. Hill, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
创建时间:
2021-04-29
二维码
社区交流群
二维码
科研交流群
商业服务