Data and scripts for the "Genomic architecture of the clownfish hybrid Amphiprion leucokranos"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14644029
下载链接
链接失效反馈官方服务:
资源简介:
Data and scripts used for the analysis of the publication Genomic architecture of the clownfish hybrid Amphiprion leucokranos
Pictures
The "leucokranos_pictures" directory contains the picture of the hybrid (Amphiprion leucokranos) individuals used for the colour pattern analysis. Names of the files match the name of the individual.
VCF files
filtered_snps_no_outgroup.vcf.gz
Filtered VCF file of 64 individuals, including only A. chrysopterus, A. sandaracinos and A. leucokranos. We hard-filtered the VCF following the GATK recommendations, since no well-curated training resources are available for base recalibration (QD < 2.0, MQ < 40.0, FS > 60.0, SOR > 3.0, MQRankSum < -12.5, ReadPosRankSum < -8.0). We performed additional filtering using VCFtools V0.1.16 (Danecek et al. 2011) and kept only biallelic SNPs with a quality above 30 (--minQ 30), which were informative for at least 85% of the individuals (--max-missing 0.85), with a minimum depth of 7 (--minDP 7), a maximum depth of 40 (--maxDP 40) and a minor allele count greater than or equal to 3 (--mac 3). More info about individuals present in the VCF file are available in the vcf_individuals_info.csv file.
filtered_snps_outgroup.vcf.gz
Filtered VCF file of 74 individuals, which includes the outgroup A. clarkii used for the ABBA-BABA analysis. We hard-filtered the VCF following the GATK recommendations, since no well-curated training resources are available for base recalibration (QD < 2.0, MQ < 40.0, FS > 60.0, SOR > 3.0, MQRankSum < -12.5, ReadPosRankSum < -8.0). We performed additional filtering using VCFtools V0.1.16 (Danecek et al. 2011) and kept only biallelic SNPs with a quality above 30 (--minQ 30), which were informative for at least 85% of the individuals (--max-missing 0.85), with a minimum depth of 7 (--minDP 7), a maximum depth of 40 (--maxDP 40) and a minor allele count greater than or equal to 3 (--mac 3). More info about individuals present in the VCF file are available in the vcf_individuals_info.csv file.
Scripts
The directory "scripts.zip" contains all the scripts to generate the VCF files as well as the scripts used for the analysis in the paper. All the information about the pipelines and the scripts are in the README in the scripts directory. Content of the directory:
1_filtering_reads/- 1_run_qc_trimming.sh- TruSeq3-PE-2.fa
2_mapping/- 2_1_run_bwa_ref_indexing.sh- 2_2_run_bwa_mapping.sh
3_variant_calling/- 3_1_gatk_preprocessing.py- 3_1_run_gatk_preprocessing.sh- 3_2_index_ref_genome.sh- 3_3_gatk_haplotype_call.sh- 3_4_merge_vcfs.sh- 3_5_genomics_db_import.sh- 3_6_joint_genotyping.sh- 3_7_gather_vcfs.sh
4_variants_filtering:- 4_1_variant_stats.sh- 4_2_individuals_filtering.sh- 4_2_variant_filtering.sh
5_analysis:- 5_1_pca- 5_2_admix- 5_3_mitogenomes- 5_4_elai- 5_5_pop_gen_stats
README.md
创建时间:
2025-03-06



