five

Virus+ Sequence Masked Mouse Reference Genome (GRCm38)

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4116248
下载链接
链接失效反馈
官方服务:
资源简介:
A version of the mouse genome (GRCm38) masked for all possible viral sequences. See Virus+ Masked Human Genome for a masked human reference database. The following commands were used to generate the additional virus sequence masked reference database: 1) Download all RefSeq and Neighbor nucleotide records: https://www.ncbi.nlm.nih.gov/nuccore/?term=Viruses[Organism]%20NOT%20cellular%20organisms[ORGN]%20NOT%20wgs[PROP]%20NOT%20gbdiv%20syn[prop]%20AND%20(srcdb_refseq[PROP]%20OR%20nuccore%20genome%20samespecies[Filter]) 2) Shred the downloaded viral genomes using shred.sh from the bbtools package shred.sh in=refseq_virus_reformated.fasta out=virus_shred.fasta.gz length=85 minlength=75 overlap=30 3) Map shredded virus sequence to the GRCm38 genome using bbmap.sh from the bbtools package bbmap.sh ref=GRCm38.fa.gz in=virus_shred.fasta.gz outm=map_mouse_all_viruses.sam minid=0.90 4) Mask virus sequenced mapped regions from the GRCm38 genome using bbmask.sh from the bbtools package bbmask.sh in=GRCm38.fa.gz out=GRCm38_virus_masked.fasta.gz sam=map_mouse_all_viruses.sam 5) Remove all N's to further reduce file size using seqkit seqkit -is replace -p "n" -r "" GRCm38_virus_masked.fasta.gz  > mouse_virus_masked.fasta_Ns_removed.gz Additional References: bbtools seqkit NCBI Virus Genome RefSeq
创建时间:
2021-02-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作