Dataset for "High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features"
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7527904
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains intermediate data for the publication, "High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features". Detailed descriptions of the methods, analyses, and generation of these files are available there.
This repository contains the following files:
all_gene_multi_tissue_methylation.csv: Arabidopsis Col-0 %CG methylation across cauline leaf, embryo, flower bud, and rosette leaf tissue as described in Williams et. al 2022.
all_gene_multi_tissue_expression.csv: Arabidopsis Col-0 expression across 52 samples as described in Mergner et. al 2020
all_gene_table.csv: Gene name, HV status (hv, non-hv, or all_genes), log2(TPM+1) and %CG methylation from rosette leaf tissue, distance to nearest TE, Pi, and Tajima's D for all genes in Arabidopsis Col-0
NLR_gene_table.csv: the same information in all_gene_table.csv, subset to NLRs. Additional information only relevant to NLRs includes cluster_type (singleton, major, or minor), cluster (name of cluster, if exists), Nterm domain, Clade from phylogenetic analysis in Prigozhin and Krasileva 2021, PiN, PiS, PiN/PiS, and distance to nearest NLR. Mutation.Probability.Score from Monroe et. al 2020.
Atha_NLR_common_names.csv: common names matched to gene IDs used in this analysis
Athaliana_NLR_Entropy.tsv: Shannon entropy calculated per gene in reference to Col-0 NLRs.
popgen_per_domain.csv: NLR specific, domain level statistics. Alignment coordinates of domain annotations are found in nlr_aa_annotation.csv
egglib_window_stats.csv: contains the results of the sliding window analysis for NLRs. Statistics were calculated on 300bp windows with a step size of 75bp. The nuc_midpoint is the midpoint of that window.
nlr_aa_annotation.csv: Codon alignment coordiates of major domains across NLRs using majority vote across accessions. NB-ARC, TIR, and CC annotations were collected from previous work (Van de Weyer et. al, 2019), and LRR annotations were determined using LRRpredictor (Martin et. al, 2020).
nlrome_IDs.txt: list of NLRome IDs (Van de Weyer et. al, 2019) used to make VCF files for the whole genome popgen analysis
popgen_per_domain.csv: Per domain population genetics statistics calculated from alignments of NLRs using egglib
positive_selection.csv: Sites under pervasive diversifying selection were identified using FEL (Kosakovsky Pond & Frost, 2005) and sites under episodic diversifying selection were identified using MEME (Murrell et al, 2012) using the internal branches of the phylogeny (Pond et al, 2006; Avanzato et. al, 2019). Value is the percentage of codons under positive selection determined by 95% confidence.
创建时间:
2024-04-22



