Calling structural variants with confidence from short-read data in wild bird populations
收藏DataONE2024-03-08 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:9f020d76319059ae3e549b4f51edcd12845e48fa48dfa45846e2b99473fb32e7
下载链接
链接失效反馈官方服务:
资源简介:
Comprehensive characterisation of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation (SV), reproducible and high-confidence SV callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus) individuals. To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of SVs is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analysing short-read discovered SV datasets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by ..., The raw Illumina reads and assembled reference genome from this article are also published and available at NCBI, Bioproject number PRJNA255814 (Passer domesticus reference accession number SAMN02929199). Trimmed reads were aligned with BWA-MEM (bwa v.0.7.17) to the short-read reference genome assembly for Passer domesticus (Elgvin et al. 2017), NCBI: GCA_001700915.1_Passer_domesticus-1.0), and then sorted and indexed with Samtools (samtools v. 1.9). All unplaced scaffolds were removed and thus only mapped chromosomal regions were included in downstream analyses.
Larger (>20bp) structural variants (deletions, duplications, and inversions) from the aligned .bam files using LUMPY (Layer et al. 2014) and genotyped the resulting calls with SVTyper (Chiang et al. 2015), via the smoove pipeline (Pedersen et al. 2020). The resulting VCF file of raw structural variant calls analysed in the study is included in the following file: sparrow_all.smoove.square.anno.vcf.gz
Repetitive elements were..., , # Calling structural variants with confidence from short-read data in wild bird populations
---
Included files:
* **sparrow_all.smoove.square.anno.vcf.gz** = VCF file of raw structural variant calls analysed in the study
* Scripts and walk-throughs for generating the above VCF file, downstream filtering and plotting in Samplot, creation of a PlotCritic curation project
* **passerDomesticusAnnotatedRepeats.gff** = TE annotation generated in this study
## Description of the data and file structure
Larger (>20bp) structural variants (deletions, duplications, and inversions) from the aligned .bam files using LUMPY (Layer et al. 2014) and genotyped the resulting calls with SVTyper (Chiang et al. 2015), via the smoove pipeline (Pedersen et al. 2020). The resulting VCF file of raw structural variant calls analysed in the study is included here:
* sparrow_all.smoove.square.anno.vcf.gz
A Unix or Linux command line terminal can be used to open the file. Functions such as \"less\" ...
创建时间:
2025-07-28



