A systematic screen for genetic factors underpinning transposon defense systems across the fungal kingdom

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14809875

下载链接

链接失效反馈

官方服务：

资源简介：

Supporting information S1 Table. Metadata for all 1,239 genome assemblies analyzed in the study. S2 Table. Repeat summaries for all 1,239 genome assemblies analyzed in the study. S3 Table. Number of identified sequences sharing 80-95% or 95-100% identity across the 1,239 genome assemblies analyzed in the study. Columns denote the number of across different sequence length (<100 bp, <1 kb, <5 kb or >10 kb) S4 Table. Local Moran's I metric estimated for the different genome assembly metrics across our 1,239 genomes' phylogeny. The p-values are computed given 1,000 permutations using the lipaMoranfunction from the phylosignal R package S5 Table. Positions across the phylogeny (edge_num) where shifts in a genome assembly metric (var column) were identified. The shift column denotes the direction (sign) and intensity of the shift as calculated by the Oushifts function from the phylolm R package. The is_tip column indicates if the shift is located at a terminal edge. The coshift column indicates if multiple shifts were mapped at the same edge. S6 Table. Relative synteny for a list of 4,666 orthogroups mostly single-copy across the 1,239 genome assemblies (>80% of the species) S7 Table. Conserved orthogroups in the 1,239 genome assemblies and their associated protein domains S8 Table. Summary of the number of proteins assigned per orthogroup S9 Table. Number of species with unique or paralog proteins assigned to each orthogroup S10 Table. List of genes and Pfams used to estimate presence / absence of DNA biology related function across the 1,239 genome assemblies S11 Table. K-mer frequency calculated at coding, non-coding and repeated sequences in each assembly filtered for scaffolds larger than 50 kb. Non-coding enrichment calculated as the ratio of the k-mer frequency at non-coding over its frequency at coding sequences. Repeat enrichment calculated as the ratio of the k-mer frequency at repeats over its frequency at non-coding sequences. S12 Table. Number of k-mer >2-fold enriched at non-coding sequences (n_noncoding), repeats (n_repeat) or both (n_both) across the 1,239 genome assemblies. The ratio of the number of k-mer>2-fold enriched at both non-coding and repeats over the total number of k-mer enriched >2-fold is used to estimate recent repeat-induced point mutation activity (Figure 2E). S13 Table. List of orthogroups associated with one of the genome assembly metrics (variable column) given the simultaneous, subsequent or terminal model implemented in the treeWAS R package (mode column). The number of models for which we find an association is denote in the n_associationcolumn (1 to 3) S14 Table. List of the protein domains most commonly associated (top 1 Pfam) with proteins assigned to orthogroups associated with variation in one of the genome assembly metrics (variable column). The number of proteins with the given Pfam domain is given in the n_protein column S15 Table. Values assigned for the different variables used for association mapping across the 1,239 genome assemblies

创建时间：

2025-04-12