Neural, genetic, and developmental changes underlie evolving butterfly mate preference

Mendeley Data2024-04-13 更新2024-06-30 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.z8w9ghxjz

下载链接

链接失效反馈

官方服务：

资源简介：

NOTE: References and supplementary figures and tables can be found in the associated publication. Heliconius cydno alithea genome assembly and annotation We isolated DNA from thorax of a single adult H. cydno alithea female using the QIAGEN Genomic-tip 20/G following the manufacturer’s instructions with the following modifications: tissue was incubated at 50oC shaking at 800 rpm overnight in lysis buffer. We used 4 ug of this high molecular weight DNA as input to Oxford Nanopore Technologies (ONT) ligation library preparation kit SQK-LSK 110. We prepared libraries following the manufacturer’s instructions with modifications based on this protocol (https://www.protocols.io/view/dna-extraction-and-nanopore-library-prep-from-15-3-bp2l6n3kzgqe/v1). End-repair was performed at 20oC for 1 hour, dA-tailing was performed for 30 minutes, ligation was performed for 1 hour at room temperature, and all bead elution steps were allowed to proceed for one hour. We also used PacBio SRE XS kit to remove <10kb fragments in the final libraries. Final libraries were sequenced on an ONT minION with version 9.4.1 flow cell. We performed basecalling using Guppy and the super accurate basecalling model in dna_r9.4.1_450bps_sup.cfg supplied with the basecaller. For the genome assembly, we adopted a similar strategy as (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8085124/). We then generated the initial draft assembly using Flye 2.9 (4) with estimated genome size of 282Mb (based on GenomeScope estimate https://github.com/schatzlab/genomescope). and Shasta 0.10.0 (https://github.com/chanzuckerberg/shasta) with default parameter. The flye assembly and shasta assembly were first polished with two rounds of racon 1.5.0 https://github.com/isovic/racon and one round of medaka 1.8.1 with ONT reads, and then purged to remove duplicate scaffolds (typically uncollapsed allelic variation) using purge_dups (https://github.com/dfguan/purge_dups) and merged together with quickmerge (https://github.com/mahulchak/quickmerge). The merged assembly was further purged using purge_dups. Finally, we scaffolded H. c. alithea contigs to the Heliconius melpomene v2.5 assembly using RagTag (6) and renamed H. c. alithea scaffolds to match. Sequencing accession and genome assembly statistics can be found in tables S7 and S8. We annotated H. c. alithea scaffolds using EvidenceModeler 1.1.1 (7). We first assembled the H. cydno transcriptome de novo using RNA-seq data generated by Walters et al. (8), Nallu et al. (9), and Rossi et al. (10) using Trinity v2.10.0 (11). RNA-seq data was also mapped to using STAR 2.6.1d (12), and the resulting alignments used to generate genome-guided assemblies using Trinity and StringTie 1.3.1 (13). We combined de novo and genome-guided assemblies using PASA (14). Evidence for protein-coding regions came from mapping the UniProt/Swiss-Prot (2020_06) database and all Papilionoidea proteins available in NCBI’s GenBank nr protein database (downloaded 6/2020) using exonerate (15). We identified high-quality multi-exon protein-coding PASA transcripts using TransDecoder (transdecoder.github.io), then used these models to train and run Genemark-ET 4 (16) and GlimmerHMM 3.0.4 (17). We also predicted gene models using Augustus 3.3.2 (18), the supplied heliconius_melpomene1 parameter set, and hints derived from RNA-seq and protein mapping above. Augustus predictions with >90% of their length covered by hints were considered high-quality models. Transcript, protein, and ab initio data were integrated using EVM with the weights in table S9. Raw EVM models were then updated twice using PASA to add UTRs and identify alternative transcripts. Gene models derived from transposable element proteins were identified using BLASTp and removed from the annotation set. Functional annotations were applied to this annotation set using eggNOG mapper v5 (19). The final annotation comprises 18,763 protein-coding genes and 30,325 transcripts. Heliconius RNA-sequencing We aimed to collect RNA-sequencing data from retina, optic lobe, and brain tissue at seven developmental stages in H. c. galanthus, white H. c. alithea, and yellow H. c. alithea males and females in triplicate (see Fig 3). We set up crosses between multiple males and females in the UChicago greenhouse and provided ample host plants for egg lay. Caterpillars and pupae were maintained in separate small cages for each cross, and individuals were labeled upon pupation to track developmental timing. We collected tissues from one larval stage (final instar purple crawler, ~36h before pupation), five pupal stages (p0: 12–24 hours after pupation, p2: 48–60 hap, p4: 96–108 hap, p6: 144–156 hap, and p7: 168–180 hap), and one adult stage (ad: 24–48 hours after emergence). We collected head tissue from purple crawler and p0 pupae because the main neural tissues are small and difficult to separate. We collected retina, optic lobe, and central brain separately for all remaining stages. We dissected individuals in cold PBS and immediately placed dissected tissues into RNAlater (Ambion, USA). Tissues were stored in RNAlater at -80oC until RNA extraction using TRIzol (Ambion, USA). High-quality (RIN > 7) RNA samples were treated with Turbo DNAse (Invitrogen, USA), then 1 ug was used as input for poly-A selection and RNA-seq library preparation using the NEBNext Poly(A) mRNA Magnetic Isolation Module and NEBNext UltraII Directional RNA Prep Kit following the manufacturer’s instructions with minor modifications. RNA fragmentation was performed for 10 min at 94oC. We used theNEBNext Multiplex Oligos for Illumina dual-index adapters. Double-sided selection was performed after adapter ligation to enrich for ~300 bp - 500 bp fragments. Final libraries were PCR amplified for 11 cycles. RNA-seq libraries were pooled and sequenced 2x100 bp on a NovaSeq 6000 at the University of Chicago Functional Genomics Facility. All sample information can be found in table S3 of the publication and downloaded from NCBI BioProject PRJNA1019262. RNA-seq analysis Quantification and filtering We quantified gene expression in each sample using the raw reads, the H. c. alithea transcriptome, and salmon v1.9.0 (33). The whole genome sequence was included as the decoy, and sequence composition, GC, and positional bias corrections were used during quantification. Indexes and quantification used k-mer size 31. Gene-level quantifications for all samples were loaded into R 4.2.3 using tximport 1.26.1 (34). Quantifications were then loaded into a DESeq2 object and library size normalization factors calculated using DESeq2 (35). Genes with mean expression values less than 50 were considered lowly-expressed and excluded from all analyses (fig S14). We used robust PCA on variance stabilized quantifications to identify outlier samples, analyzing each developmental stage / tissue separately, following recommendations in Chen et al. (36). Specifically, we used the PcaGrid function from the rrcov R package to reduce the data, then excluded samples with score or orthogonal distances greater than 97.5th percentiles of those distances with the sample group being analyzed (37). These analyses removed 23 outlier samples. After removing these outliers, we visually inspected sample clustering using rPCA and removed ten additional samples that aberrantly clustered with divergent tissue/stage groups. The final dataset included 260 samples and 10,409 genes. Stage-specific differential expression We identified genes that were differentially expressed between white and yellow H. c. alithea males at each stage and in each tissue using DESeq2 (35). The full expression dataset was filtered to remove H. c. galanthus samples. We were specifically interested in identifying genes that may underlie the physiological differences observed in the intracellular electrophysiology experiments displayed in Figs 1 and 2. We therefore modeled gene expression by group/sex (i.e. yellow females, white females, yellow males, white males), then tested for DEGs between yellow and white male groups. This is equivalent to fitting a complex model accounting for group, sex, and their interaction. We considered significantly DE genes to be those with FDR < 0.01 after controlling false discovery rate across the 16 stage/tissue comparisons using the Benjamini-Hochberg method (38). maSigPro analysis We identified genes with significantly different expression profiles between white and yellow males within each tissue using maSigPro 1.70.0 (39, 40). We used a dispersion parameter (theta) value of 6.0 and modeled gene expression by group and up to a fourth-degree polynomial. Genes with significant profiles were identified using p.vector and an FDR cutoff of 0.05, and variable selection was performed using T.fit and a p-value cutoff of 0.01. To limit false positive rates, only significant genes with fit r2 values > 0.8 were analyzed further. DEGs between yellow and white males were identified as those that have significantly different expression profiles specifically between those two groups. WGCNA and enrichment analyses We reconstructed the gene co-expression network using the WGCNA package and the final normalized H. c. alithea RNA-seq quantification data (41, 42). Key parameters included the soft power threshold (determined using in-built functions) of 16, we used signed adjacency matrices, and we used the signed Nowick method to create TOMs. Minimum module sizes were set to 20. We assessed module gene ontology (GO) enrichment using the eggNOG annotations and the topGO R package (43). We used Fisher Exact Tests to determine significance of GO enrichment. We further visualized GO enrichment using the simplifyEnrichment 1.8.0 (44) with the top 50 GO terms associated with each module (summarized for three modules in Fig 4). We assessed module enrichment with DEGs using Fisher Exact Tests and the combined DESeq2 and maSigPro DEG gene sets.

注：参考文献及补充图表可在相关出版物中获取。红带袖蝶（Heliconius cydno alithea）基因组组装与注释我们按照制造商说明书并进行如下修改，从单只成年雌性红带袖蝶（Heliconius cydno alithea）的胸部提取DNA：将组织置于裂解缓冲液中，于50℃、800 rpm振荡孵育过夜。我们取4 μg该高分子量DNA作为输入，使用牛津纳米孔技术公司（Oxford Nanopore Technologies, ONT）的连接建库试剂盒SQK-LSK 110进行建库。我们参照制造商说明书，并基于该方案（https://www.protocols.io/view/dna-extraction-and-nanopore-library-prep-from-15-3-bp2l6n3kzgqe/v1）进行修改后完成建库。末端修复于20℃进行1小时，dA尾加尾反应持续30分钟，连接反应在室温下进行1小时，所有磁珠洗脱步骤均静置1小时。我们还使用PacBio SRE XS试剂盒去除最终建库产物中小于10 kb的片段。最终建库产物在搭载9.4.1版本流通池的ONT MinION测序仪上完成测序。我们使用Guppy软件及其自带的dna_r9.4.1_450bps_sup.cfg高精度碱基识别模型完成碱基识别。在基因组组装方面，我们采用与文献（https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8085124/）相似的策略。随后，我们分别使用Flye 2.9（4）及默认参数的Shasta 0.10.0（https://github.com/chanzuckerberg/shasta）生成初始草稿组装，其中Flye组装基于GenomeScope预估的282 Mb基因组大小（https://github.com/schatzlab/genomescope）。我们首先使用ONT reads通过两轮racon 1.5.0（https://github.com/isovic/racon）及一轮medaka 1.8.1对Flye组装结果和Shasta组装结果进行抛光，随后使用purge_dups（https://github.com/dfguan/purge_dups）去除重复的scaffold（通常为未解折叠的等位基因变异），再通过quickmerge（https://github.com/mahulchak/quickmerge）将两个组装结果合并。合并后的组装结果再次通过purge_dups进行去重。最终，我们使用RagTag（6）将红带袖蝶（H. c. alithea）的contig挂载至Heliconius melpomene v2.5参考基因组组装，并按该参考基因组对红带袖蝶的scaffold进行重新命名。测序登录信息及基因组组装统计数据详见附表S7和S8。我们使用EvidenceModeler 1.1.1（7）对红带袖蝶的scaffold进行基因注释。我们首先使用Trinity v2.10.0（11），基于Walters等人（8）、Nallu等人（9）及Rossi等人（10）发表的RNA-seq数据对H. cydno转录组进行从头组装。我们还使用STAR 2.6.1d（12）将RNA-seq数据比对至参考基因组，再通过Trinity和StringTie 1.3.1（13）基于比对结果生成基因组引导的转录组组装。我们通过PASA（14）将从头组装与基因组引导的转录组组装结果进行合并。我们使用exonerate（15）将UniProt/Swiss-Prot（2020_06）数据库及NCBI GenBank非冗余蛋白数据库（NCBI’s GenBank nr，2020年6月下载）中所有凤蝶总科（Papilionoidea）蛋白序列比对至基因组，以此作为蛋白编码区域的注释证据。我们通过TransDecoder（transdecoder.github.io）筛选得到高质量的多外显子蛋白编码PASA转录本，以此为模型训练并运行Genemark-ET 4（16）与GlimmerHMM 3.0.4（17）。我们还使用Augustus 3.3.2（18），结合预设的heliconius_melpomene1参数集以及上述RNA-seq和蛋白比对得到的提示信息进行基因模型预测。若Augustus预测的基因模型有超过90%的序列长度被提示信息覆盖，则将其视为高质量模型。我们通过EVM结合附表S9中的权重参数，整合转录组、蛋白组及从头预测得到的基因证据。随后我们使用PASA对原始EVM模型进行两轮更新，以添加UTR（非翻译区，Untranslated Region）区域并识别可变剪接转录本。我们通过BLASTp筛选得到源自转座子蛋白的基因模型，并将其从注释集合中移除。我们使用eggNOG mapper v5（19）对该注释集合进行功能注释。最终的注释集合包含18763个蛋白编码基因及30325条转录本。红带袖蝶RNA测序我们计划采集三个类群的样本：红带袖蝶加拉索亚种（H. c. galanthus）、白色型红带袖蝶（H. c. alithea）及黄色型红带袖蝶（H. c. alithea）的雌雄个体，涵盖7个发育阶段的视网膜、视神经叶及脑组织，每个组设置3次生物学重复（详见图3）。我们在芝加哥大学温室中开展多组雌雄杂交实验，并提供充足的寄主植物供成虫产卵。每个杂交组合的幼虫和蛹分别饲养于独立的小笼中，并在化蛹时对个体进行标记以追踪发育时间。我们采集了1个幼虫阶段（末龄紫爬虫，化蛹前约36小时）、5个蛹阶段（p0：化蛹后12~24小时，p2：化蛹后48~60小时，p4：化蛹后96~108小时，p6：化蛹后144~156小时，p7：化蛹后168~180小时）及1个成虫阶段（ad：羽化后24~48小时）的组织。由于紫爬虫及p0蛹的主要神经组织体积较小难以分离，我们采集了这两个阶段的头部组织。对于其余所有发育阶段，我们分别采集视网膜、视神经叶及中枢脑组织。我们在预冷的PBS中解剖个体，并将解剖得到的组织立即置于RNAlater（Ambion，美国）中。组织样品在RNAlater中保存于-80℃，直至使用TRIzol（Ambion，美国）提取RNA。我们使用Turbo DNA酶（Invitrogen，美国）处理质量合格的RNA样品（RNA完整性指数RNA Integrity Number, RIN>7），随后取1 μg RNA作为模板，使用NEBNext Poly(A) mRNA磁珠分离模块及NEBNext UltraII定向RNA建库试剂盒，参照制造商说明书并进行小幅修改后完成poly(A)富集与RNA-seq建库。 RNA片段化反应在94℃下进行10分钟。我们使用适配Illumina的NEBNext多重寡核苷酸引物作为双端索引接头。接头连接完成后，我们通过两轮磁珠筛选富集300~500 bp的片段。最终建库产物通过11轮PCR进行扩增。将RNA-seq建库产物混合后，在芝加哥大学功能基因组学中心的NovaSeq 6000测序仪上完成2×100 bp双端测序。所有样品信息详见该出版物的附表S3，亦可从NCBI BioProject PRJNA1019262下载获取。 RNA测序数据分析定量与过滤我们使用salmon v1.9.0（33），基于原始测序reads与红带袖蝶（H. c. alithea）转录组对每个样品的基因表达量进行定量。我们将全基因组序列作为诱饵序列加入分析，并在定量过程中校正序列组成、GC含量及位置偏好性。构建索引与定量过程均采用31 bp的k-mer长度。我们使用tximport 1.26.1（34）将所有样品的基因水平定量结果导入R 4.2.3环境中。随后将定量结果导入DESeq2对象，并通过DESeq2（35）计算文库大小标准化因子。平均表达量低于50的基因被视为低表达基因，将其从所有分析中排除（详见图S14）。我们参照Chen等人（36）的建议，对方差稳定化后的定量结果进行鲁棒主成分分析（PCA），并按每个发育阶段/组织分别进行异常样品识别。具体而言，我们使用rrcov R包中的PcaGrid函数对数据进行降维，随后排除得分或正交距离大于所分析样品组对应距离的97.5%分位数的样品（37）。该步骤共移除23个异常样品。移除这些异常样品后，我们通过鲁棒PCA对样品聚类情况进行可视化检查，并移除了10个与其他组织/发育阶段组异常聚类的样品。最终的分析数据集包含260个样品及10409个基因。阶段特异性差异表达分析我们使用DESeq2（35）鉴定每个发育阶段、每个组织中白色型与黄色型红带袖蝶雄虫之间的差异表达基因（DEG）。我们对完整表达数据集进行过滤，移除红带袖蝶加拉索亚种（H. c. galanthus）的样品。我们的核心目标是鉴定可能与图1、图2中胞内电生理实验所观察到的生理差异相关的基因。因此，我们按组/性别（即黄色型雌虫、白色型雌虫、黄色型雄虫、白色型雄虫）对基因表达进行建模，随后检验黄色型与白色型雄虫组之间的DEG。该分析等价于构建包含组、性别及其交互作用的复杂模型。我们通过Benjamini-Hochberg方法（38）对16个阶段/组织的比较结果控制错误发现率（False Discovery Rate, FDR），将FDR<0.01的基因视为显著差异表达基因。 maSigPro分析我们使用maSigPro 1.70.0（39,40）鉴定每个组织中白色型与黄色型雄虫之间表达谱存在显著差异的基因。我们设置离散参数（theta）为6.0，按组对基因表达进行建模，并采用至多四次多项式拟合表达趋势。我们通过p.vector函数及FDR cutoff=0.05鉴定表达谱显著差异的基因，并通过T.fit函数及p-value cutoff=0.01进行变量筛选。为控制假阳性率，我们仅对拟合优度r²>0.8的显著差异基因进行后续分析。黄色型与白色型雄虫之间的DEG被定义为仅在这两个组之间表达谱存在显著差异的基因。加权基因共表达网络分析（WGCNA）与富集分析我们使用WGCNA R包及最终标准化后的红带袖蝶RNA-seq定量数据构建基因共表达网络（41,42）。关键参数设置如下：软阈值幂为16（通过内置函数确定），采用符号邻接矩阵，使用符号Nowick方法构建拓扑重叠矩阵（Topological Overlap Matrix, TOM）。最小模块大小设置为20。我们使用eggNOG注释结果与topGO R包（43）对模块进行基因本体（Gene Ontology, GO）富集分析。我们使用Fisher精确检验判断GO富集的显著性。我们使用simplifyEnrichment 1.8.0（44）对每个模块的前50个富集GO条目进行可视化（图4展示了3个模块的汇总结果）。我们使用Fisher精确检验，结合DESeq2与maSigPro得到的DEG基因集，对模块进行差异基因富集分析。

创建时间：

2023-10-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集