Local LD index in HSV-1 genome alignments: 7 CSF samples, 11 SWAB samples and 18 combined samples.
收藏DataCite Commons2022-04-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/Local_LD_index_in_HSV-1_genome_alignments_7_CSF_samples_11_SWAB_samples_and_18_combined_samples_/8088635
下载链接
链接失效反馈官方服务:
资源简介:
To screen for potential localised genomic recombination, we performed a site-by-site LD analysis within 2000 and 3000bp sliding windows using the "genome-wide_LD_scan.r" script from the “genomescans” suite. This scan tests all the pairwise associations between polymorphism patterns of a fixed number (20) of evenly spaced biallelic sites within a window using Fisher's exact tests; windows containing less than this number of SNPs were excluded. To identify windows with stronger LD than in average genome-wide, it then compares the distribution of pairwise p-values within the window to the distribution obtained combining all possible windows in the genome using a Mann-Whitney-Wilcoxon test, reporting windows with - log<sub>10</sub> p-value as a score, the local LD index (LDI, only scores > 5 were considered significant). This was performed on an alignment of genomes from all samples in this study, as well as on alignments of genomes from the CSF and Swab samples only, treating them as separate populations. To verify that the estimates of linkage in these data subsets<sup> </sup>reflected a group-specific population structure and not a bias induced by the difference in dataset sizes, we resampled the genomic datasets, drawing 30 pseudorandom combinations of genomes the size of each dataset, sampling equally from CSF and SWAB (= nonCSF) group. This way, we obtained a baseline distribution of local LD at each genome site under the hypothesis of no group-specific population structure; for group-specific subset analyses, only LDI scores falling out of the 95% confidence interval of this simulated distribution were deemed significant. <br>This file archive contains a folder for each dataset:<br>- LDscan2000-all/ the full dataset scanned with 2kb windows;- LDscan3000-all/ the full dataset scanned with 3kb windows;- LDscan3000-CSF/ the CSF samples-only dataset scanned with 3kb windows;- LDscan3000-nonCSF/ the SWAB samples-only dataset scanned with 3kb windows.<br>The set of files in each of these folders allow one to re-produce the analysis by running the script "genome-wide_localLD_scan.r" with the option "-o folder_name".<br><br>Among the files present in these folders:- "LD_Fisher.minalfrq1.biallelicsites.max1gaps.whole-matrix.RData" are R-loadable binary data archive that contain the matrix of Fisher test p-values for all pairs of biallelic sites in the dataset mapped alignment.- "LD_Fisher.minalfrq1.biallelicsites.max1gaps.whole-matrix.RData" similarly contain the matrix of all r^2 values for all pairs of biallelic sites.<br>- "compare_LD_metrics_LDIsigthresh5.pdf" are plots that compare different metrics measuring the local LD in scanning windows. In all panels, windows identified as significant on the basis of LDI > 5 are shown as red dots.
提供机构:
figshare
创建时间:
2019-05-29



