16S V4 raw read count data; 16S reads metadata; new MHC class II allele sequences

NIAID Data Ecosystem2026-03-13 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.4f4qrfj9t

下载链接

链接失效反馈

官方服务：

资源简介：

Pathogen-mediated selection at the major histocompatibility complex (MHC) is thought to promote MHC-based mate choice in vertebrates. Mounting evidence implicates odour in conveying MHC genotype, but the underlying mechanisms remain uncertain. MHC effects on odour may be mediated by odour-producing symbiotic microbes whose community structure is shaped by MHC genotype. In birds, preen oil is the primary source of body odour and similarity at MHC predicts similarity in preen oil composition. Hypothesizing that this relationship is mediated by symbiotic microbes, we characterized MHC genotype, preen gland microbial communities, and preen oil chemistry of song sparrows (Melospiza melodia). Consistent with the microbial mediation hypothesis, pairwise similarity at MHC predicted similarity in preen gland microbiota. Overall microbial similarity did not predict chemical similarity of preen oil, counter to this hypothesis. However, permutation testing identified a maximally predictive set of microbial taxa that best reflect MHC genotype, and another set of taxa that best predict preen oil chemical composition. The relative strengths of relationships between MHC and microbes, microbes and preen oil, and MHC and preen oil suggest that MHC may affect host odour both directly and indirectly. Thus, birds may assess MHC genotypes based on both host-associated and microbially-mediated odours. Methods We extracted bacterial DNA from swabs using DNeasy PowerSoil DNA isolation kits (Qiagen) with some modifications to the manufacturer’s recommended protocol (see supplementary material for detailed protocol). We amplified the V4 region of the bacterial 16S rRNA gene using the universal primers F518 [20] and R806 [21]. In addition to the priming sequences, each primer included an Illumina MiSeq adaptor sequence, four randomized nucleotides, and a unique ‘barcode’ of eight nucleotides. We performed PCR in a total volume of 25 µL, including 10 µL of 5PRIME HotMasterMix (Quantabio), 0.2 µM of each primer, and 2 µL of DNA template (mean concentration = 0.1 ng/µL). The thermocycling profile consisted of 2 min at 94 °C; 35 cycles of 45 s at 94 °C, 60 s at 50 °C and 90 s at 72 °C; and a 10 min final extension at 72 °C. Amplification was confirmed by running samples on a 2% agarose gel. Sequencing and pipeline We pooled PCR products into a library and sequenced with 250 nt paired-end reads on an Illumina MiSeq at the London Regional Genomics Centre. We used a pipeline [22] to collapse sequences into clusters of identical reads and assign sequences to individuals. We used a second pipeline [23] and the R package dada2 [24] to overlap reads, remove ambiguous reads, and filter out chimeras and singleton sequences (i.e., those appearing only once in the dataset) and those rarer than 0.1% in any sample. We assigned each unique sequence variant (SV) to bacterial taxon by clustering at ≥ 97% sequence identity (following [22]) using the naïve Bayesian Ribosomal Database Project (RDP) Classifier [24,25]. We used a compositional data analysis approach [26] that examines the read ratios between sequences. Most datasets do not actually contain all possible components; often, small values, including values below the detection limit of an instrument such as the Illumina MiSeq, are rounded off to zero. In such cases, zero counts reflect sampling or equipment limitations [27]. Accordingly, following [23], we used Bayesian-multiplicative replacement to impute values for zero count sequences using the R package zCompositions [27]. We then applied a centred log-ratio (clr) transformation to the zero-replaced data set, which renders the use of Euclidean distances meaningful in subsequent analyses [23,28]. Next, because rare sequences that occur in only a few samples are generally uninformative, and because samples with very low read counts are more likely to represent undersampling, we filtered sequences by the minimum proportion, minimum occurrence, and minimum sample count of reads. Thus, sequences found in fewer than 0.5% of reads, sequences found in fewer than 10% of samples, and samples with fewer than 5000 reads were removed from the initial dataset. The filtered dataset contained 47 SVs from across the 31 samples (Table S1). Detailed methods and quality control procedures are outlined in the supplementary materials. MHC genetic analysis We amplified the hypervariable second exon of MHC class II (~ 350 nt) using primers SospMHCint1f [15] and Int2r.1 [29]. In addition to the priming sequences, each primer included an Illumina MiSeq adaptor sequence, four randomized nucleotides, and a unique ‘barcode’ of eight nucleotides. Detailed PCR conditions and MHC sequencing methods are described elsewhere [6]. Briefly, we aligned amino acid sequences in MEGA v. 7.0 [30] and trimmed based on comparison to conspecific sequences in GenBank [31]. Trimming resulted in alleles of 73 – 86 amino acids, corresponding to most of exon 2. Next, we used a pipeline [22] to collapse sequences into clusters of identical reads and assign sequences to individuals, retaining sequences comprising at least 1% of an individual’s reads (mean ± SE retained reads per individual = 20 736 ± 1939). We assigned each retained sequence to its corresponding protein sequence, removed any putative pseudogenes following [15], and applied Bayesian-multiplicative replacement and clr-transformation to the data to allow comparison to the microbial dataset. In some cases, different DNA sequence reads translated to the same amino acid sequence. For these, we calculated the average log-ratio value so that only unique protein sequences were included in further analysis. Across all 31 birds, we detected 151 unique amino acid alleles (mean ± SE alleles per individual = 16.23 ± 0.61).

创建时间：

2021-10-11