five

Twigstats scripts and example dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13880458
下载链接
链接失效反馈
官方服务:
资源简介:
This repository provides all scripts to run Relate and Twigstats on imputed ancient genomes. We also provide a complete self contained example dataset, but you should be able to use the exact same scripts on your own datasets as well. Installation Please install bcftools if you haven't already (https://samtools.github.io/bcftools/howtos/install.html). Please make sure that the executable is added to your PATH and that BCFTOOLS_PLUGINS is set to the correct plugin path (see bcftools link). Please download Relate from https://myersgroup.github.io/relate/ Please also install the R package Twigstats from https://leospeidel.github.io/twigstats/ Optional: For plotting purposes and downstream analyses, please install the R packages relater from https://github.com/leospeidel/relater/ ggplot2 dplyr tidyr plyr umap Download To run this on your own dataset please download scripts.tgz and Relate_input_files.tgz. To run the provided example, please additionally download example_data_chr1.tgz or example_data.tgz. All output files that are generated by run_wg.sh are stored under results/. Running the scripts Please extract tar balls, e.g. using tar -xzvf scripts.tgz. The script run.sh shows how to run everything 'in order' for chromosome 1. The script run_wg.sh runs everything for the whole genome.You can find the individual scripts that are being called under scripts/.  Input files The directory example_data_chr1 stores files for only chromosome 1, whereas example_data stores files for the whole genome. Under example_data/ and example_data_chr1/ you will find the following files: GLIMPSE imputed vcf, here named ancients_glimpse2_chr1.bcf. Modern vcf (e.g. 1000G), here named 1000GP_sub_chr1.bcf. A poplabels file listing population labels for each individual. Individuals have to appear in the same order as in the merged vcf file. The file should contain four columns: ID POP GROUP SEX. The second column is used for population assignment. A second poplabels file used for the MDS analysis. The second column should now list IDs of all individuals plotted in the MDS (i.e. should be identical to first column). The outgroup should be grouped together into one population. File containing sample ages in generations, two lines per sample (diploid), e.g. for 3 samples of ages 0, 10, and 100 generations:001010100100 We provide all the other required Relate input files under Relate_input_files/. You can reuse these in your analysis. In this example, we are using data from the 1000 Genomes Project dataset (Nature 2015). We additionally use low coverage shotgun genomes from Anglo-Saxon contexts, British Iron/Roman Age, Irish Bronze Age, and the Scandinavian Early Iron Age (Cassidy et al, PNAS 2016; Martiniano et al, Nature Communications 2016; Anastasiadou et al, Communications Biology 2023; Schiffels et al Nature Communications 2016; Gretzinger et al Nature 2022; Rodriguez-Varela et al Cell 2023). These were imputed using GLIMPSE (https://odelaneau.github.io/GLIMPSE). Step by step guide Please follow run.sh (chromosome 1 only). The script run_wg.sh will run the whole genome. These scripts will Run scripts/1_prep_vcf.sh to filter the imputed genotypes.  Then run scripts/2_prep_Relate.sh to prepare Relate input files Finally run scripts/3_run_Relate.sh to estimate genealogies We can use these Relate files for various analyses: You can run Twigstats and infer admixture proportions using Rscript scripts/4_run_Twigstats.R. You can estimate coalescence rates and population sizes using Rscript scripts/5_plot_popsize.R. You can run an MDS using Rscript scripts/6_plot_MDS.R. To see the arguments required in each script, you can execute the script without arguments, e.g. by executing scripts/1_prep_vcf.sh or Rscript scripts/4_run_Twigstats.R. The expected output is shown in the attached pdf.
创建时间:
2024-10-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作