Additional file 1 of Extensive genomic rearrangements mediated by repetitive sequences in plastomes of Medicago and its relatives
收藏DataCite Commons2024-02-20 更新2024-07-28 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Extensive_genomic_rearrangements_mediated_by_repetitive_sequences_in_plastomes_of_Medicago_and_its_relatives/16620913/1
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1 : Table S1. The plastome assembly, annotation information, and distributions of genomic rearrangements for the 75 individuals. Table S2. Detailed information of repeat content for the 75 individuals. Repeat content for the three IR regained plastomes were calculated using only one IR copy. Table S3. Repeats mediated tRNA duplicates. For dispersed repeats, F: forward repeat; C: complement repeat; P: palindromic repeat; R: reverse repeat, and the numbers after the colon represent length of dispersed repeats. For tandem repeats, the numbers before the colon represent length of tandem repeats, the content after the colon represent unit size × copy number. Table S4. Repeats around endpoints of inversions. For dispersed repeats, F: forward repeat; C: complement repeat; P: palindromic repeat; R: reverse repeat, and the numbers after the colon represent length of dispersed repeats. For tandem repeats, the numbers before the colon represent length of tandem repeats, the content after the colon represent unit size × copy number. Palindromic repeat (P) are marked in red. Table S5. Palindromic repeat sequences around endpoints of inversions. Table S6. Repetitive DNA in the acquired introns. Table S7. Nucleotide diversity (pi) for different genes, intergenic regions, and datasets. PC, plastid coding regions; PN, plastid noncoding regions; PCN, the whole plastome. Eight highly divergent coding regions (π > 0.04) and 16 highly divergent non-coding regions (π > 0.1) are marked in red. Table S8. Sequence divergence in accD among the IRLC species. Table S9. Sequence divergence in clpP among the IRLC species. Table S10. Sequence divergence in ycf1 among the IRLC species. Table S11. Sequence divergence in matK among the IRLC species. Table S12. Sequence divergence in rbcL among the IRLC species. Table S13. Information of repetitive elements for the coding sequence (CDS) of three genes (accD, clpP and ycf1) with accelerated substitution rates and two relatively conserved genes (matK and rbcL). There is no coding sequence (CDS) in accD for M. polymorpha because it is a pseudogene (truncated sequence). Table S14. Percent repetitive DNA of the three localized hypermutation regions around the three genes (accD, clpP, and ycf1) with accelerated substitution rates and the remaining plastome sequences. Table S15. Locations of the 75 individuals representing 20 Medicago, Trigonella, and Melilotus species. The individuals for which were planted in laboratory are marked by asterisks. Plastomes of Medicago truncatula were assembled from whole-genome resequencing data downloaded from NCBI (SRR1524305 and SRR965443). The outgroup was downloaded from NCBI (NC_011828.1). The individuals for which were chosen as the representatives of each species are marked in red. Table S16. The 73 protein-coding genes (CDS) shared across 21 taxa included in the phylogenetic analysis. Table S17. Taxa included in the synonymous and nonsynonymous divergence analyses of accD, clpP, ycf1, matK, and rbcL. (√) adopt in analysis, (−) not available in NCBI and not adopt in analysis.
附加文件1:表S1。本表格包含75个个体的质体基因组(plastome)的组装、注释信息,以及基因组重排的分布情况。表S2:75个个体的重复序列含量详细信息。其中3个获得反向重复(IR,inverted repeat)的质体基因组的重复序列含量仅以其中一个反向重复拷贝进行计算。表S3:由重复序列介导的tRNA复制事件。对于散在重复序列:F代表正向重复(forward repeat),C代表互补重复(complement repeat),P代表回文重复(palindromic repeat),R代表反向重复(reverse repeat),冒号后的数字表示散在重复序列的长度。对于串联重复序列:冒号前的数字表示串联重复序列的长度,冒号后的内容表示「单元长度×拷贝数」。表S4:倒位端点周围的重复序列。散在重复序列的标注规则同表S3;串联重复序列的标注规则亦同表S3。回文重复(P)以红色标记。表S5:倒位端点周围的回文重复序列。表S6:获得的内含子中的重复DNA序列。表S7:不同基因、基因间区域及数据集的核苷酸多样性(π,nucleotide diversity)。其中,PC代表质体编码区域,PN代表质体非编码区域,PCN代表整个质体基因组。8个高度分化的编码区域(π>0.04)及16个高度分化的非编码区域(π>0.1)以红色标记。表S8:IRLC类群中accD基因的序列分化情况。表S9:IRLC类群中clpP基因的序列分化情况。表S10:IRLC类群中ycf1基因的序列分化情况。表S11:IRLC类群中matK基因的序列分化情况。表S12:IRLC类群中rbcL基因的序列分化情况。表S13:5个基因的编码序列(CDS,coding sequence)的重复元件信息,其中包含3个替换速率加速的基因(accD、clpP和ycf1)以及2个相对保守的基因(matK和rbcL)。需注意:M. polymorpha的accD基因不存在编码序列(CDS),因其为截短的假基因(pseudogene)。表S14:3个替换速率加速的基因(accD、clpP和ycf1)周围的三个局部高突变区域,以及剩余质体基因组的重复DNA占比情况。表S15:代表20个苜蓿属(Medicago)、胡芦巴属(Trigonella)和草木樨属(Melilotus)物种的75个个体的采样位置。其中在实验室种植的个体以星号标记。蒺藜苜蓿(Medicago truncatula)的质体基因组从NCBI下载的全基因组重测序数据(SRR1524305与SRR965443)组装得到。外类群序列从NCBI下载(NC_011828.1)。每个物种的代表个体以红色标记。表S16:系统发育分析所包含的21个类群共有的73个蛋白质编码基因(CDS,coding sequence)。表S17:用于accD、clpP、ycf1、matK及rbcL基因同义替换与非同义替换差异分析的类群信息。其中(√)代表该类群纳入分析,(−)代表该序列在NCBI中不可获取且未纳入分析。
提供机构:
figshare
创建时间:
2021-09-15



