Reconstructing NOD-like receptor alleles with high internal conservation in Podospora anserina using long-read sequencing

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.h18931zww

下载链接

链接失效反馈

官方服务：

资源简介：

NOD-like receptors (NLRs) are intracellular immune receptors that detect pathogen-associated cues and trigger defense mechanisms, including regulated cell death. In filamentous fungi, some NLRs mediate heterokaryon incompatibility, a self/non-self recognition process that prevents the vegetative fusion of genetically distinct individuals, reducing the risk of parasitism. The het-d and het-e NLRs in Podospora anserina are highly polymorphic incompatibility genes (het genes) whose products recognize different alleles of the het-c gene via a sensor domain composed of WD40 repeats. These repeats display unusually high sequence identities maintained by concerted evolution. However, some sites within individual repeats are hypervariable and under diversifying selection. Despite extensive genetic studies, inconsistencies in the reported WD40 domain sequence have hindered functional and evolutionary analyses. Here we demonstrate that the WD40 domain can be accurately reconstructed from long-read sequencing (Oxford Nanopore and PacBio) data, but not from Illumina-based assemblies. Functional alleles are usually formed by 11 highly conserved repeats, with different repeat combinations underlying the same phenotypic het-d and het-e incompatibility reactions. Protein structure models suggest that their WD40 domain folds into two 7-blade β-propellers composed of the highly conserved repeats, as well as three cryptic divergent repeats at the C-terminus. We additionally show that one particular het-e allele does not have an incompatibility reaction with common het-c alleles, despite being 11-repeats long. Our findings provide a robust foundation for future research into the molecular mechanisms and evolutionary dynamics of het NLRs, while also highlighting both the fragility and the flexibility of β-propellers as immune sensor domains. Methods This dataset consists of genome assemblies of three wildtype strains (Y+, Z+, and Wa63+) and nine lab strains that were the product of backcrossing different het-e, het-d, and het-c alleles into the genomic background of strains s ("little s"). Whole-genome DNA of most strains was extracted with the Zymo Quick-DNA Fungal/Bacterial Miniprep Kit D6005 (Zymo Research; https://zymoresearch.eu/). For the strain CmEm-, ~800mg of mycelia were used for high-molecular-weight DNA extraction using the QIAGEN Genomic-tip 100/G kit (Qiagen). Oxford Nanopore Technology (ONT) sequencing was performed in-house using a Native Barcoding Kit 24 V14 SQK-NBD114.24 and a MinION Mk1C machine following the standard protocol. In total, 12 strains were barcoded into two pools (pool1: CmEm-, CoEc+, CoEc-, Y+, Z+, and Wa63+ with barcodes 1 to 6, and pool2: CoEf+, ChEhDa+, ChEhDa-, CaDa-, CsDf+, and CsDf- with barcodes 7 to 12). Each pool was sequenced in two separate R10.4.1 flow cells (FLO-MIN114). Basecalling was performed using Dorado v. 0.5.3 (https://github.com/nanoporetech/dorado/) with the dna_r10.4.1_e8.2_400bps_sup@v4.3.0 model. The resulting BAM files were transformed into fastq files with the bam2fq program of SAMtools v. 1.19.2 (Danecek et al. 2021). Reads corresponding to the DNA Control Sample (DNA CS) introduced during library preparation were removed using chopper v. 0.7.0 (De Coster and Rademakers 2023). For each sample, we removed reads that contained perfect matches to ONT native barcodes assigned to other samples. We removed barcodes and performed minimum quality control with fastplong v. 0.2.2 (Chen 2023) and parameters --trimming_extension 20 -l 50 -q 15 -d 0.1 (hereafter, cleaned ONT reads). The cleaned ONT reads of each sample were used as input for Flye v. 2.9.3 (Kolmogorov et al. 2019), with parameters --nano-hq --iterations 2. In addition, the paired-end Illumina reads of the strains Wa63+, Y+, Z+, and Wa137- were retrieved from NCBI’s Sequence Read Archive (accession numbers SRX5458088, SRX5458091, SRX11405146, and SRX8537866) and assembled with SPAdes v. 4.0.0 (Prjibelski et al. 2020) using the --careful parameter and either the default k-mers setting (Wa63-, Z+, and Y-) or the k-mers 21, 33, 55, and 77 (all strains, "allkmers"). From all these assemblies, the nucleotide sequences of the het-e, het-d, and het-r genes were extracted and aligned manually (file 2024.09.10_hnwd_master_het_reAl_Illu_noGuides_noemptycols.fa). Associated code can be found in the repository https://github.com/SLAment/FixingHetDE. New sequencing data was deposited in NCBI’s Sequence Read Archive Bioproject PRJNA1216259. References Chen S. 2023. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. GigaScience 10: giab008. De Coster W, Rademakers R. 2023. NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics 39: btad311. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. 2019. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37: 540–546. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. 2020. Using SPAdes De Novo Assembler. CP in Bioinformatics 70: e102.

创建时间：

2025-02-05