Distances and their visualization in studies of spatial-temporal genetic variation using single nucleotide polymorphisms (SNPs)
收藏DataONE2024-01-16 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:503cf366e7bc5dfaf44e6c78499ca2f0e00e51157275563279a3f3bf676024f9
下载链接
链接失效反馈官方服务:
资源简介:
Distance measures are widely used for examining genetic structure in datasets that comprise many individuals scored for a very large number of attributes. Genotype datasets composed of single nucleotide polymorphisms (SNPs) typically contain bi-allelic scores for tens of thousands if not hundreds of thousands of loci.
We examine the application of distance measures to SNP genotypes and sequence tag presence-absences (SilicoDArT) and use real datasets and simulated data to illustrate pitfalls in the application of genetic distances and their visualization.
The datasets used to illustrate points in the associated review are provided here together with the R script used to analyse the data. Data are either simulated internal to this script or are SNP data generated as part of other studies and included as compressed binary files readily accessable by reading into R using R base function readRDS(). Refer to the analysis script for examples., A dataset was constructed from a SNP matrix generated for the freshwater turtles in the genus Emydura, a recent radiation of Chelidae in Australasia. The dataset (SNP_starting_data.Rdata) includes selected populations that vary in level of divergence to encompass variation within species and variation between closely related species. Sampling localities with evidence of admixture between species were removed. Monomorphic loci were removed, and the data was filtered on call rate (>95%), repeatability (>99.5%) and read depth (5x < read depth < 50x). Where there was more than one SNP per sequence tag, only one was retained at random. The resultant dataset had 18,196 SNP loci scored for 381 individuals from 7 sampling localities or populations â Emydura victoriae [Ord River, NT, n=15], E. tanybaraga [Holroyd River, Qld, n=10], E. subglobosa worrelli [Daly River, NT, n=25], E. subglobosa subglobosa [Fly River, PNG, n=55], E. macquarii macquarii [Murray Darling Basin north, NSW/..., , # Data from: Distances and their visualization in studies of spatial-temporal genetic variation using single nucleotide polymorphisms (SNPs)
This Dryad entry contains the datafiles and associated R script to generate the analyses presented in the companion review article. They include SNP datasets for Australian Turtles and the Australian Blue Mountains Skink, and the associated SilicoDArT data (null alleles matrix) for the turtles. They are for illustration purposes, and have been modified to meet the requirements of the analysis being presented.
## Description of the data and file structure
The turtle SNP data comprises a matrix of entities (individuals) versus attributes (loci) taking on the states 0 for homozygous reference allele, 2 for homozygous alternate allele and 1 for the heterozygous state. The data are stored in compressed form as an adegenet genlight object with associated locus metadata (e.g. callrate, reproducibility) and individual metadata (e.g. latitude, longitud...
创建时间:
2025-07-26



