The rate of amino acid divergence in Arabidopsis lyrata Plech population.
收藏DataCite Commons2025-12-24 更新2026-05-03 收录
下载链接:
https://figshare.com/articles/dataset/The_rate_of_amino_acid_divergence_in_Arabidopsis_lyrata_Plech_population_/29459012/1
下载链接
链接失效反馈官方服务:
资源简介:
We retrieved the <i>Arabidopsis thaliana</i> TAIR10 coding sequences (CDS) FASTA from the Joint Genome Institute (JGI) portal. For <i>Arabidopsis lyrata</i> NT1, we downloaded the reference genome FASTA and corresponding GTF annotation (Kolesnikova et al., 2013). We used a custom Python/GFFutils pipeline to extract and concatenate CDS exon features for each transcript directly from the GFF3 and genome FASTA, writing one CDS FASTA per transcript. We aligned each filtered CDS pair in codon space via a two-step MAFFT + PAL2NAL pipeline. First, protein translations were aligned with MAFFT v7.480 (–auto). Second, PAL2NAL v14 was used to back-translate to a two-sequence codon alignment in FASTA. Alignments were then filtered to retain only those with 100% coverage (no gaps in either sequence). Pairwise nonsynonymous (Ka) and synonymous (Ks) substitution rates were calculated on the codon alignments using KaKs_Calculator v2.0 with the Yang–Nielsen (YN00) method. We excluded any pairs for which Ka or Ks could not be estimated (e.g. no observed synonymous changes or saturated Ks) and any alignments yielding Ks < 0.01 or alignment length < 150 bp. This final QC step produced Ka/Ks estimates for 6419 ortholog pairs. Of these, 3453 showed Ka/Ks significantly ≠ 1 (P < 0.05, YN00 test). To quantify uncertainty in group‐level Ka/Ks means, we organized the 6419 genes into functional or expression‐based groups. For each group, we performed 1000 bootstrap replicates with replacement: in each replicate, we sampled N genes (where N is the group size), recalculated the mean Ka and mean Ks, and computed the mean Ka/Ks ratio.<br>
提供机构:
figshare
创建时间:
2025-12-24



