Genotype-by-environment interactions influence the composition of the Drosophila seminal proteome
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.djh9w0w5m
下载链接
链接失效反馈官方服务:
资源简介:
Ejaculate proteins are key mediators of post-mating sexual selection and sexual conflict, as they can influence both male fertilization success and female reproductive physiology. However, the extent and sources of genetic variation and condition dependence of the ejaculate proteome are largely unknown. Such knowledge could reveal the targets and mechanisms of post-mating selection and inform about the relative costs and allocation of different ejaculate components, each with its own potential fitness consequences. Here, we used liquid chromatography coupled with tandem mass spectrometry to characterize the whole-ejaculate protein composition across twelve isogenic lines of Drosophila melanogaster that were reared on a high- or low-quality diet. We discovered new proteins in the transferred ejaculate and inferred their origin in the male reproductive system. We further found that the ejaculate composition was mainly determined by genotype identity and genotype-specific responses to larval diet, with no clear overall diet effect. Nutrient restriction increased proteolytic protein activity and shifted the balance between reproductive function and RNA metabolism. Our results open new avenues for exploring the intricate role of genotypes and their environment in shaping ejaculate composition, or for studying the functional dynamics and evolutionary potential of the ejaculate in its multivariate complexity.
Methods
We derived all individuals from 12 independent isogenic lines (hereafter: “isolines”) of D. melanogaster, created through 15 generations of full-sibling inbreeding from an outbred population (approx. 1,000 adults with overlapping generations). Previous studies have reported significant genetic variation in ejaculate traits across these isolines [11,32,41]. Using isogenic lines allowed us to subject multiple individuals of each genotype simultaneously to different treatments, thereby separating genotypic from treatment effects. To establish the different developmental treatments, we transferred groups of 40 first-instar larvae to food vials, filled with either standard cornmeal medium (75 g glucose, 100 g fresh yeast, 55 g cornmeal, 8 g agar, 10 g flour, 15 ml Nipagin antimicrobial agent per litre medium) or a nutrient-restricted, less favourable version. For the latter, we diluted (9-fold) the standard medium in water and agar to a final agar concentration of 15 g/L. The standard females developed on the standard diet at around 200 individuals per culture bottle to minimize possible female size-dependent strategic sperm or SFP allocation [42,43]. We collected virgin females within 8 hours of emerging, and focal males once a day, and housed them in vials of 20 individuals, separated by sex, isoline, treatment, and day of emergence. We maintained all larvae and flies at 24°C, 60% humidity, and a 14:10 light:dark cycle.
Mating assays
For all experimental matings, all focal males were 4-5 days old, when accessory glands are about fully developed [44]. Each male had mated once on the previous day with a non-experimental virgin female to avoid potential “virgin effects” [45]. We paired each male with a 3-day-old standard female and, within 10 min of terminating copulation, jointly transferred them to a microcentrifuge tube to snap-freeze (in liquid N2) and store them at -80°C for later dissection. We completed at least nine matings per isoline-treatment combination on each of 10 consecutive days.
Dissections and tissue isolation
For each of the 24 isoline-treatment combinations, we dissected 90 mated females (total n = 2,160) on ice under a Leica MS5 stereomicroscope (Leica Microsystems, Heerbrugg, Switzerland) at 40× magnification, in PBS supplemented with 2% SDS and 5% DTT [weight/volume %]. After extracting their lower reproductive tract, we retained only the bursa copulatrix, cleared of the seminal receptacle, spermathecae, parovaria, and associated fat body. We did so to minimize contamination by secretions from these female tissues [46]. Since sperm enter the different storage organs after the end of copulation and females do not typically eject excess sperm for several hours in D. melanogaster [47,48], the bursa copulatrix immediately after mating provided a representative ejaculate sample. We pooled groups of 10 bursae per combination in microcentrifuge tubes and stored them at -80°C until further processing. Additionally, to determine the diet effect on male phenotype and to rule out possible female size-mediated ejaculate tailoring [42,43], we measured the thorax length of 35 focal males and standard females per isoline-treatment combination to the nearest 0.025mm using a reticular eyepiece (see supplemental materials for details about thorax length analysis).
Sample preparation and TMT labelling
We distributed the 90 bursae per combination across three biological replicates, resulting in a total of n = 72 samples, each containing 30 bursae, which we analysed across five quantification experiments using tandem mass spectrometry (MS/MS) with 16-plex Tandem Mass Tag (TMT, Thermo Fisher) labelling (Fig. S1).
For protein extraction, samples were mechanically solubilized in PBS with a pestle (2%SDS, 5%DTT), and proteins were acetone-precipitated and resuspended in iST-NHS buffer (iST-NHS kit, PreOmics). After protein digestion, peptides were labelled with a 1:2.5 TMT16 tag ratio (peptide:label) before resuspension in 3% acetonitrile / 0.1% formic acid. We equally combined the individual samples and fractionated offline the TMT pools using high-pH reverse-phase chromatography. Each TMT experiment consisted of 20 liquid chromatography–mass spectrometry (LC-MS/MS) runs on a Thermo Scientific Fusion Lumos mass spectrometer equipped with a Waters M-Class LC system. We followed each scan by a data-dependent MS/MS scan and isolated the most abundant fragments. We assigned reporter ion such that they reduced the effect of cross-population interference (i.e., channel leakage; supplemental Fig. S1; [49]) and increased quantitative accuracy of proteins across isobaric labels [50]. Detailed information can be found in the supplementary material.
Protein identification and quantification
We processed the raw data in Proteome Discoverer v2.5 (Thermo Fisher Scientific), searching against the D. melanogaster protein database containing only the longest isoform for each protein (r6.32; [51], n = 13,813 entries). To increase the reliability of protein identification for quantification, we considered only those proteins for which at least two proteotypic peptides were detected. We assigned spectra using a precursor mass tolerance of 20 ppm and fragment ion tolerance of 0.5 Da using the Sequest algorithm [52]. Static modifications included a specific cysteine modification (Acetylhypusine, +113.084 Da), TMT (+304.207 Da) on the peptide N-terminus and lysine residues. Variable modifications included an acetylation on the N-terminus of the protein end (+42.011 Da), oxidized methionine (+15.995 Da) and methionine loss with (‑131.040 Da) and without Acetyl (-89.030 Da). Enzyme specificity was set to trypsin allowing a minimal peptide length of six amino acids and a maximum of two missed cleavages. The maximum false discovery rate (FDR) for peptides was set to 0.01. For reporter ion quantification the integration tolerance was 20 ppm for the most confident peak. Protein fold changes were computed based on Intensity values reported in the Protein output.
Proteome Discoverer identified n = 4,328 unique proteins across the five TMT experiments (supplemental Table S1, Fig. S2), which were exported to R v.4.1.2 (R Core Team, 2021) for all further analyses. On average across all TMT runs, proteins of male origin constituted 40.44% and those of female origin 51.51% of the detected proteins, whilst 18.98% were shared between sexes (Fig. S2). To maximize coverage across TMT experiments, we imputed for any protein with a single missing identification (n = 445) the mean diet-specific abundance of the other four TMT experiments. To be conservative, we excluded all proteins with missing identification in more than one TMT experiment (n = 1,336 proteins). This resulted in a final dataset of n = 2,992 unique proteins (supplemental Table S1). Batch effects between TMT experiments were removed using a ComBat/SL/TMM/log2 normalization procedure (detailed in the supplemental material, Fig. S3, supplemental Table S1). We then selected those proteins that overlapped with the D. melanogaster sperm proteome (n = 2,288; [53]), or with either the high-confidence (n = 311) or candidate set of SFPs (n = 314; [4]). Note that n = 188 of these proteins had shared IDs between SpPs and SFPs. Of the n = 1,284 proteins that overlapped with the previously published lists of ejaculate proteins, n = 636 have also been recorded in the female reproductive tract [54], thus leaving their origin unclear. We restricted our dataset to those n = 648 (i.e., n = 1,284 - 636) proteins that, to our knowledge, were uniquely male-derived and transferred during copulation (henceforth referred to as ‘working dataset’). The working dataset was complemented with 24 candidate SpPs and 27 candidate SFPs based on gene expression profiles in FlyAtlas 2 (FlyAtlas 2 [55]. To be conservative, we considered proteins as candidates if their genes were strongly overexpressed (top 10%) in males relative to females and were additionally strongly biased towards either accessory glands or testes among the male tissues (i.e., most extreme 10% on either side; for details on the procedure see the supplemental material and Fig. S4). Adding these 51 candidate proteins to our working dataset resulted in an ‘extended working dataset’ of n = 699 proteins, which consisted of 140 SFPs, 433 sperm proteins, 75 proteins with dual annotation and 51 candidate ejaculate proteins (Table S1). All analyses are based on the ‘extended working dataset’.
Effects of diet and genotype
To investigate differentially expressed proteins for the high−low diet contrast, we fitted a mixed effects model with the normalized abundances as the response variable, diet and isoline as fixed effects and triplicates as a random effect, using the build_model function (prolfqua package [56]). To test for diet-dependent enrichment of genes that encode ejaculate proteins, we ranked proteins by their t-statistics obtained from the high−low diet contrasts. We then subjected these proteins to a gene set enrichment analysis (GSEA) using the gseGO function (clusterProfiler package [57]), with the full D. melanogaster proteome as a background (org.Dm.eg.db package[VZ1] ), considering categories as enriched if their Benjamini-Hochberg adjusted p-value was <0.05.
We assessed relationships between normalized samples in a principal component analysis (PCA) using the prcomp function (stats package; R Core Team, 2021). Given their corresponding contribution to total variance (see Results), we used only the first two principal components (i.e., PC1 and PC2) as response variables in further analyses. We investigated the variance explained by the experimental factors on PC1 and PC2 using type III ANOVAs (which are suitable for models with interaction terms) as implemented in the Anova function (car package [58]), with either PC1 or PC2 as a response variable and diet, isoline and the isoline × diet interaction as fixed effects. We transformed the resulting F-values to standardized effect sizes (partial ε2) with 95% confidence intervals using the epsilon_squared function (effectsize package; [59]). We used Cohen’s [60] benchmarks of ε2 = 0.01, 0.06 and 0.14 to define effects as small, medium and large, respectively. We quantified the contribution of single proteins to either PC1 or PC2 using the fviz_contrib function (factoextra package [61]). Based on the STRING database (v.11.5 [62]), we then explored pairwise interactions among the proteins above the expected mean contribution in the working dataset, separately for SpPs and SFPs, retaining only interaction scores >0.9. To further visualize the ANOVA results, we illustrated the relationships between samples in a heatmap using the pheatmap function (pheatmap package [63]) with the ward.D clustering method [64]. Since a previous study [65] estimated the mode of evolution (constrained, positive, relaxed) for the SFPs (but not SpPs), we further mapped these categories onto our heatmaps and tested for links between mode of evolution and differential protein abundances.
射精蛋白是交配后性选择与性冲突的关键介导因子,因其可同时影响雄性的受精成功率与雌性的生殖生理。然而,射精蛋白质组(ejaculate proteome)的遗传变异程度、来源及其条件依赖性在很大程度上仍未明确。此类研究可揭示交配后选择的作用靶点与机制,并有助于阐明不同射精组分的相对成本与分配策略——每种组分均具有其各自的适合度效应。本研究借助液相色谱-串联质谱(liquid chromatography coupled with tandem mass spectrometry),对取食高质量或低质量日粮的12个黑腹果蝇(Drosophila melanogaster)同基因系(isogenic lines)的全射精蛋白组成进行了表征。我们在交配转移的射精液中发现了新的蛋白,并推断了其在雄性生殖系统中的来源。进一步研究发现,射精组成主要由基因型及其对幼虫日粮的基因型特异性响应所决定,未观察到显著的整体日粮效应。营养限制可提升蛋白水解酶活性,并改变生殖功能与RNA代谢之间的平衡。本研究结果为探索基因型及其环境在塑造射精组成中的复杂作用,或研究射精在多变量复杂性下的功能动态与进化潜力开辟了新的研究方向。
方法
所有实验个体均源自12个独立的黑腹果蝇同基因系(以下简称“同基因系”),这些品系通过对一个远交种群(约1000只具有重叠世代的成虫)进行15代全同胞近交构建而成。此前的研究已报道了这些同基因系在射精性状上存在显著的遗传变异[11,32,41]。利用同基因系,我们可同时将同一基因型的多个个体置于不同处理条件下,从而将基因型效应与处理效应分离开来。
为设置不同的发育处理,我们将40只第一龄幼虫转移至食物管中,食物管内分别装有标准玉米粉培养基(每升培养基含75g葡萄糖、100g新鲜酵母、55g玉米粉、8g琼脂、10g面粉、15ml尼泊金抗菌剂),或营养受限的劣质培养基。后者通过将标准培养基用水和琼脂稀释9倍制得,最终琼脂浓度为15g/L。为最小化雌性体型依赖的策略性精子或精液蛋白(seminal fluid protein, SFP)分配效应[42,43],标准组雌性在标准培养基中以每培养瓶约200只的密度饲养。
我们在成虫羽化后8小时内收集处女雌蝇,同时每日收集目标雄蝇,按性别、同基因系、处理组及羽化日期分别置于20只每管的培养管中饲养。所有幼虫与成虫均饲养于24℃、相对湿度60%、光暗周期14:10的环境中。
交配实验
所有交配实验中,目标雄蝇均为4~5日龄,此时其副腺已基本发育完全[44]。每只雄蝇在前一日均与一只非实验用处女雌蝇交配一次,以避免潜在的“处女效应”[45]。我们将每只雄蝇与一只3日龄的标准雌蝇配对,在交配结束后10分钟内,将雌雄共同转移至微量离心管中,用液氮快速冷冻并于-80℃保存,用于后续解剖。每个同基因系-处理组合在连续10天内至少完成9次交配。
解剖与组织分离
对于24个同基因系-处理组合,我们在冰上使用Leica MS5体式显微镜(Leica Microsystems,瑞士赫尔布鲁格)以40倍放大倍数解剖了90只交配雌蝇(总样本量n=2160),解剖液为添加了2% SDS和5% DTT(质量体积百分比)的PBS。剥离雌性下生殖道后,我们仅保留交配囊(bursa copulatrix),并去除受精囊(seminal receptacle)、储精囊(spermathecae)、侧输卵管(parovaria)及相关脂肪体,以最小化这些雌性组织分泌物带来的污染[46]。由于黑腹果蝇在交配结束后精子才会进入不同储存器官,且雌性通常在数小时内不会排出多余精子[47,48],因此交配后即刻的交配囊可作为具有代表性的射精样本。我们将每10个交配囊合并至一个微量离心管中,于-80℃保存直至后续处理。
此外,为确定日粮对雄性表型的影响并排除雌性体型介导的射精调整效应[42,43],我们使用测微目镜测量了每个同基因系-处理组合中35只目标雄蝇和标准雌蝇的胸部长度,精确至0.025mm(胸部长度分析的详细步骤见补充材料)。
样品制备与串联质量标签(Tandem Mass Tag, TMT)标记
我们将每个组合的90个交配囊分为3个生物学重复,最终得到共72个样本,每个样本包含30个交配囊。我们通过16-plex串联质量标签(TMT)标记结合串联质谱(MS/MS),在5次定量实验中完成了所有样本的分析(图S1)。
蛋白提取时,样本用研磨棒在含2% SDS和5% DTT的PBS中机械裂解,蛋白经丙酮沉淀后重悬于iST-NHS缓冲液(iST-NHS试剂盒,PreOmics)。蛋白酶解后,肽段以1:2.5的肽段:标记物比例用TMT16标签进行标记,随后重悬于3%乙腈/0.1%甲酸溶液中。我们将所有样本等量混合,并通过高pH反相色谱对TMT标记的混合样本进行离线分级。每次TMT实验均在配备Waters M-Class液相系统的Thermo Scientific Fusion Lumos质谱仪上进行20次液相色谱-质谱(LC-MS/MS)分析。每次扫描后均进行数据依赖性MS/MS扫描,并分离丰度最高的片段。我们对报告离子进行了优化,以降低跨样本干扰(即通道泄漏;补充图S1;[49])并提升同量异位素标记下蛋白定量的准确性[50]。详细步骤见补充材料。
蛋白鉴定与定量
我们使用Proteome Discoverer v2.5(Thermo Fisher Scientific)处理原始数据,搜索数据库为仅包含每个蛋白最长异构体的黑腹果蝇蛋白数据库(r6.32;[51],共13813条条目)。为提升定量蛋白鉴定的可靠性,我们仅保留至少检测到2个专属肽段(proteotypic peptides)的蛋白。我们使用Sequest算法进行谱图匹配,前体质量容忍度为20 ppm,片段离子容忍度为0.5 Da。固定修饰包括半胱氨酸的乙酰基腐胺修饰(+113.084 Da)、肽段N端及赖氨酸残基上的TMT标记(+304.207 Da)。可变修饰包括蛋白N端乙酰化(+42.011 Da)、甲硫氨酸氧化(+15.995 Da)以及甲硫氨酸缺失(带乙酰基:-131.040 Da,不带乙酰基:-89.030 Da)。酶切特异性设置为胰蛋白酶,允许最小肽段长度为6个氨基酸,最多2个漏切位点。肽段的最大错误发现率(false discovery rate, FDR)设置为0.01。报告离子定量的积分容忍度为20 ppm(针对最可信峰)。蛋白倍数变化基于蛋白输出文件中的强度值计算。
Proteome Discoverer在5次TMT实验中共鉴定到4328个独特蛋白(补充表S1,图S2),后续所有分析均将数据导出至R v.4.1.2(R核心团队,2021)中进行。在所有TMT运行中,平均而言,雄性来源的蛋白占检测到蛋白的40.44%,雌性来源的占51.51%,另有18.98%的蛋白为雌雄共有(图S2)。为最大化TMT实验间的覆盖度,我们对仅单次缺失鉴定的蛋白(n=445),补充了其余4次TMT实验中该日粮特异性的平均丰度。为保守起见,我们排除了在超过1次TMT实验中缺失鉴定的蛋白(n=1336),最终得到包含2992个独特蛋白的数据集(补充表S1)。我们使用ComBat/SL/TMM/log2归一化程序去除了TMT实验间的批次效应(详细步骤见补充材料,图S3,补充表S1)。
随后,我们筛选出与黑腹果蝇精子蛋白组(sperm proteome, SpP)重叠的蛋白,或与高置信度精液蛋白组(n=311)及候选精液蛋白组(n=314;[4])重叠的蛋白。其中有188个蛋白同时属于精子蛋白组与精液蛋白组。在与已发表射精蛋白列表重叠的1284个蛋白中,有636个也曾在雌性生殖道中被记录[54],因此其来源尚不明确。我们将数据集限定为已知为雄性特异性来源且在交配中转移的648个蛋白(即1284-636,以下称为“工作数据集”)。我们还补充了基于FlyAtlas 2数据库(FlyAtlas 2[55])基因表达谱筛选得到的24个候选精子蛋白与27个候选精液蛋白。为保守起见,我们将那些在雄性中相对于雌性显著高表达(前10%),且在雄性组织中显著偏向副腺或睾丸表达(即处于两端各10%的极端范围)的基因所编码的蛋白视为候选蛋白(详细步骤见补充材料与图S4)。将这51个候选蛋白加入工作数据集后,得到包含699个蛋白的“扩展工作数据集”,其中包括140个精液蛋白、433个精子蛋白、75个双注释蛋白及51个候选射精蛋白(表S1)。所有分析均基于“扩展工作数据集”开展。
日粮与基因型的效应
为探究高低日粮对比下的差异表达蛋白,我们使用build_model函数(prolfqua包[56])构建了混合效应模型,以归一化丰度为响应变量,日粮与同基因系为固定效应,重复样本为随机效应。为检测编码射精蛋白的基因是否存在日粮依赖性富集,我们根据高低日粮对比得到的t统计量对蛋白进行排序,随后使用gseGO函数(clusterProfiler包[57])进行基因集富集分析(gene set enrichment analysis, GSEA),以完整的黑腹果蝇蛋白质组作为背景(org.Dm.eg.db包[VZ1]),当Benjamini-Hochberg校正后的p值<0.05时,认为该类别显著富集。
我们使用prcomp函数(stats包;R核心团队,2021)进行主成分分析(PCA)以评估样本间的归一化丰度关系。根据其对总方差的贡献(见结果部分),我们仅选取前两个主成分(PC1和PC2)作为后续分析的响应变量。我们使用Anova函数(car包[58])中的III型方差分析(适用于包含交互项的模型),分别以PC1和PC2为响应变量,以日粮、同基因系及同基因系×日粮交互项为固定效应,探究实验因子对PC1和PC2的解释度。我们使用epsilon_squared函数(effectsize包;[59])将得到的F值转换为标准化效应量(偏ε²)及95%置信区间。我们采用Cohen[60]提出的基准:ε²=0.01、0.06和0.14分别对应小、中、大效应量。我们使用fviz_contrib函数(factoextra包[61])量化单个蛋白对PC1或PC2的贡献。基于STRING数据库(v.11.5[62]),我们分别对精子蛋白组与精液蛋白组中贡献度高于预期均值的蛋白,探索其两两相互作用,仅保留相互作用得分>0.9的条目。为进一步可视化方差分析结果,我们使用pheatmap函数(pheatmap包[63])结合ward.D聚类方法[64]绘制样本间关系的热图。
由于此前的一项研究[65]对精液蛋白(而非精子蛋白)的进化模式(纯化选择、正向选择、放松选择)进行了估算,我们进一步将这些类别映射至我们的热图中,并检验进化模式与蛋白丰度差异之间的关联。
创建时间:
2023-08-07



