Genetic, developmental and neural changes underlying evolving butterfly mate preference
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.z8w9ghxjz
下载链接
链接失效反馈官方服务:
资源简介:
Many studies have linked genetic variation to behavior, but the links between that variation and the neural circuits that drive behavior remain elusive. We investigated the architecture of mate choice behavior in Heliconius butterflies, which use vision to identify preferred mates based on wing color patterns. We found that Heliconius cydno mate preference is associated with inter-photoreceptor inhibition of ultraviolet-sensitive photoreceptors (PRs) by long-wavelength sensitive PRs; identified a small number of genetic loci associated with preference variation; and began to link these multiple layers of behavior variation together through analyses of developmental gene networks. Our results support the idea that altered peripheral neural computations, driven by changes to underlying developmental genetic processes, can significantly and rapidly alter essential behaviors.
This repository contains raw data, scripts, notebooks, and intermediate files that were used to perform and analyze genome-wide association studies and to analyze RNA-seq data generated by and presented in the associated publication. Raw electrophysiology results are also provided.
Methods
Animals
We used butterflies from four different taxa. The H. c. alithea used in the preference and color GWA analyses were previously tested for courtship behavior in Ecuador in 2008 by Nicola Chamberlain and Durrell Kapan. These butterflies were tested for their preference, then the bodies stored in 100% ethanol at -80oC until genomic DNA extractions (see below) [1]. For all other experiments, butterflies were housed in greenhouse breeding colonies at the University of Chicago that were regularly supplemented with new individuals. Adults were fed Bird’s Choice artificial nectar ad libitum and supplied with blooming Lantana as an additional source of nectar and pollen. Heliconius cydno galanthus and H. melpomene pupae were obtained from El Bosque Nuevo in Costa Rica, and H. c. alithea from Heliconius Butterfly Works in Ecuador. Heliconius pachinus and F1 H. c. galanthus X H. pachinus hybrids were bred in Panama and adults were transported to Chicago for experiments. Collection, rearing, import and export were done under permits from Ecuador, Panama, Costa Rica, and USA.
Heliconius cydno alithea (yellow) genome assembly and annotation
We isolated DNA from thorax of a single adult yellow H. cydno alithea female using the QIAGEN Genomic-tip 20/G following the manufacturer’s instructions with the following modifications: tissue was incubated at 50oC shaking at 800 rpm overnight in lysis buffer. We used 4 ug of this high molecular weight DNA as input to Oxford Nanopore Technologies (ONT) ligation library preparation kit SQK-LSK 110. We prepared libraries following the manufacturer’s instructions with modifications based on the protocol found here: https://www.protocols.io/view/dna-extraction-and-nanopore-library-prep-from-15-3-bp2l6n3kzgqe/v1. This protocol differs from the manufacturer’s protocols in the following ways. End-repair was performed at 20oC for 1 hour, dA-tailing was performed for 30 minutes, ligation was performed for 1 hour at room temperature, and all bead elution steps were allowed to proceed for one hour at room temperature. We also used PacBio SRE XS kit to remove <10kb fragments in the final libraries.
Final libraries were sequenced on an ONT MinION with version 9.4.1 flow cell. We performed basecalling using Guppy and the super accurate basecalling model in dna_r9.4.1_450bps_sup.cfg supplied with the basecaller. For the genome assembly, we adopted a similar strategy as (Steward et al. 2021). We generated the initial draft assembly using Flye 2.9 [2] with estimated genome size of 282Mb (based on GenomeScope estimate https://github.com/schatzlab/genomescope) and Shasta 0.10.0 (https://github.com/chanzuckerberg/shasta) with default parameters. The Flye assembly and Shasta assembly were polished with two rounds of racon 1.5.0 (https://github.com/isovic/racon) and one round of medaka 1.8.1 with ONT reads, and then purged to remove duplicate scaffolds (typically uncollapsed allelic variation) using purge_dups (https://github.com/dfguan/purge_dups). Finally, the duplicate scaffolds were merged together with quickmerge (https://github.com/mahulchak/quickmerge) and purged using purge_dups.
To simplify comparisons across species, we scaffolded H. c. alithea contigs to the Heliconius melpomene v2.5 chromosome-level assembly using RagTag [3] and renamed H. c. alithea scaffolds to match. Finally, we identified and soft-masked repeat sequences using RepeatModeler and RepeatMasker [4,5]. The genome sequence and annotation used in this study can be found in this repository z8w9ghxjz in the `gwas/data/genome/` directory. The final genome assembly comprised 310 scaffolds spanning 294 Mb, with 287 Mb assigned to H. melpomene chromosomes. BUSCO v5 analysis showed the H. c. alithea genome contained 97.7% complete (97.3% single-copy, 0.4% duplicated), 0.4% fragmented, and 1.9% missing OrthoDB v10 Endopteryogota (2,124 single-copy orthologs) SCOs.
We annotated H. c. alithea scaffolds using EvidenceModeler 1.1.1 [6]. We first assembled the H. cydno transcriptome de novo using RNA-seq data generated by Walters et al. [7], Nallu et al. [8], and Rossi et al. [9] using Trinity v2.10.0 [10]. RNA-seq data was also mapped to using STAR 2.6.1d [11], and the resulting alignments used to generate genome-guided assemblies using Trinity and StringTie 1.3.1 [12]. We combined de novo and genome-guided assemblies using PASA [13]. Evidence for protein-coding regions came from mapping the UniProt/Swiss-Prot (2020_06) database and all Papilionoidea proteins available in NCBI’s GenBank nr protein database (downloaded 6/2020) using exonerate [14]. We identified high-quality multi-exon protein-coding PASA transcripts using TransDecoder (transdecoder.github.io), then used these models to train and run Genemark-ET 4 [15] and GlimmerHMM 3.0.4 [16]. We also predicted gene models using Augustus 3.3.2 [17], the supplied heliconius_melpomene1 parameter set, and hints derived from RNA-seq and protein mapping above. Augustus predictions with >90% of their length covered by hints were considered high-quality models. Transcript, protein, and ab initio data were integrated using EVM with the weights in table S8.
Raw EVM models were then updated twice using PASA to add UTRs and identify alternative transcripts. BUSCO v5 analysis of the final annotated protein set showed it contained 94.8% complete, 1.8% fragmented, and 3.4% missing OrthoDB v10 endopteryogta SCOs (n = 2124). Gene models derived from transposable element proteins were identified using BLASTp and removed from the annotation set. Functional annotations were applied to this annotation set using eggNOG mapper v5 [18]. The final annotation comprises 18,763 protein-coding genes and 30,325 transcripts. We identified 1:1 orthologs to Drosophila melanogaster proteins using reciprocal BLASTp, assigning orthologs only to those genes where the top hit was identical between the two directions (i.e. Hca → Dmel AND Dmel → Hca). Gene annotations, eggNOG results, and Drosophila orthologs are supplied in this repository z8w9ghxjz.
Heliconius cydno alithea genome re-sequencing and variant calling
Genomic DNA used for calling variants was isolated from thorax of 113 H. c. alithea males studied by Chamberlain et al. [1] using chloroform extractions. These individuals were assessed for their mate preference in 2008, then stored in 100% ethanol at -80oC until gDNA extractions in 2015 - 2019. We re-sequenced all individuals with multiple courts, plus a number of males with just a single court, that produced high-quality genomic DNA, yielding a subset of 113 males from the original 175 included in the original study. Illumina paired-end libraries were constructed using the KAPA Hyper Prep Kit (KAPA Biosystems) or Nextera Library Prep Kit and sequenced to ~15X using 2x100 bp on an Illumina HiSeq2500 or 4000 at the University of Chicago Functional Genomics Facility. Raw data can be found in NCBI BioProject PRJNA802836.
Low-quality regions and adapters were trimmed from raw reads using Trimmomatic before mapping to the H. c. alithea reference using bowtie2 v2.3.2 with default settings except --very-sensitive-local [19]. We then marked PCR duplicate reads with Picard and realigned around putative indels using the Genome Analysis Toolkit (GATK) 4.2 [20,21]. SNP and indel calling was performed using the HaplotypeCaller and GenotypeGVCFs module in GATK 4.3.0 with the heterozygosity priors set to 0.01 for both SNPs and indels. Scripts and variant calls in PLINK bed/bim/fam format can be found in this repository in `gwas/data/plink`.
RNA-sequencing data
We aimed to collect RNA-sequencing data from retina, optic lobe, and brain tissue at seven developmental stages in H. c. galanthus, white H. c. alithea, and yellow H. c. alithea males and females in triplicate. We used controlled crosses between H. c. alithea males and females that were homozygous for the top wing color variant, thus ensuring that larvae and pupae from each cross would (if they were allowed to emerge) develop a single wing color. We identified appropriate adults for crosses by clipping a single leg from each individual that emerged from each shipment, extracting DNA from that leg using DNA ExtractALL reagents (Thermo), then performing a custom TaqMan genotyping assay for the wing color variant using the leg DNA. Only males and females that were homozygous for the yellow or white allele were used to set up “yellow” or “white” crosses. All H. c. galanthus individuals were used in H. c. galanthus crosses. We set up crosses between multiple males and females in the UChicago greenhouse and provided ample host plants for egg lay. Caterpillars and pupae were maintained in separate small cages for each cross, and individuals were labeled upon pupation to track developmental timing. We collected tissues from one larval stage (final instar purple crawler, ~36h before pupation), five pupal stages (p0: 12 - 24 hours after pupation, p2: 48 - 60 hap, p4: 96 - 108 hap, p6: 144-156 hap, and p7: 168-180 hap), and one adult stage (ad: 24-48 hours after emergence). Pupal sex was determined using external pupal characteristics (https://www.ucl.ac.uk/taxome/jim/Mim2/heliconius_pupa_sex_difference.html) as well as the presence/absence of testis, which are very prominent in butterflies.
We collected head tissue from purple crawler and p0 pupae because the main neural tissues are small and difficult to separate. We collected retina, optic lobe, and central brain separately for all remaining stages. We dissected individuals in cold PBS and immediately placed dissected tissues into RNAlater (Ambion, USA). Tissues were stored in RNAlater at -80oC until RNA extraction using TRIzol (Ambion, USA). High quality (RIN > 7) RNA samples were treated with Turbo DNAse (Invitrogen, USA), then 1 ug was used as input for poly-A selection and RNA-seq library preparation using the NEBNext Poly(A) mRNA Magnetic Isolation Module and NEBNext UltraII Directional RNA Prep Kit following the manufacturer’s instructions with minor modifications. RNA fragmentation was performed for 10 min at 94oC. We used the NEBNext Multiplex Oligos for Illumina dual-index adapters to uniquely barcode each sample. Double-sided selection was performed after adapter ligation to enrich for ~300 bp - 500 bp fragments. Final libraries were PCR amplified for 11 cycles. RNA-seq libraries were pooled and sequenced 2x100 bp on a NovaSeq 6000 at the University of Chicago Functional Genomics Facility.
All raw RNA-seq data downloaded from NCBI BioProject PRJNA1019262.
Preliminary processing of RNA-seq data
We quantified gene expression in each sample using the raw reads, the yellow H. c. alithea transcriptome, and salmon v1.9.0 [31]. The whole genome sequence was included as the decoy, and sequence composition, GC, and positional bias corrections were used during quantification. Indexes and quantification used k-mer size 31. Quantifications, scripts for analysis, and other data objects can be found in this repository in the `rnaseq/` directory.
Electrophysiology methods
For in vivo recordings, butterflies at least 3 days old were restrained in a custom built collar with heated beeswax. A small hole was cut in the dorsal eye to allow for electrode penetration along the dorsal-ventral axis of the eye and covered with silicone grease to prevent desiccation. A second small hole was cut near the mouthparts and a silver-chloride reference electrode was placed into the anterior portion of the head. The butterfly was then placed on a stage with the eye at the center of a Cardan arm perimeter device to allow for equivalent light stimulation at any spatial location.
Photoreceptor responses were evoked using monochromatic stimuli ranging from 310 nm to 630 nm in 10 nm increments (fig S12). The light source was a dual Halogen-Deuterium lamp (DH-2000s, Ocean Optics), which was connected to a scanning monochromator (Monoscan-2000, Ocean Optics). Stimulus timing was controlled with an optical shutter (OZ Optics) and focused onto the butterfly eye using a collimator and lens (Edmund Optics). Every component was connected to each other using 1 mm fiber optic cables. Stimulus intensity was calibrated with a photodiode (Newport) and set to 1.5 X 1015 photons/cm2/s using a variable neutral density filter in a rotational motor (Newport). Recordings were amplified with a 0.1X headstage and high impedance amplifier (AxoClamp 900A, Molecular Devices) and digitized at 10 kHz (DigiData1550, Molecular Devices).
Photoreceptors were recorded intracellularly using sharp electrodes made from borosilicate glass on an electrode puller (P-97, Sutter Instruments). Electrodes were pulled to a resistance between 90 and 120 MΩ and filled with 3 M KCl. Recordings were made exclusively from cells in the ventral half of the eye. All photoreceptors responded to white light with depolarizations of at least 30 mV. Stimuli were presented in a random order with 4 repeats per stimulus. Typically, responses were recorded at multiple intensity levels using neutral density filters (Thorlabs). After recording spectral responses, we also presented the wavelength that evoked the maximum response at 9 intensity levels that varied over 4 log units of attenuation. These V-Log(I) curves were used to transform the isoquantal spectral responses of the photoreceptors to a spectral sensitivity curve using the Naka-Rushton equation. The wavelength of peak sensitivity was estimated for each cell by fitting its responses with a standard rhodopsin tuning template. To measure response latency, we first measured the mean and standard deviation of the resting potential for 500 ms before the light flash. Onset latency was defined as the time for the response to exceed five times the standard deviation of this baseline.
For experiments with the LED, we used green LEDs with peak tuning at 534 nm and a full width half maximum of 12 nm. Six LEDs were attached to the monochromatic source and had an intensity of 3.2 X 1015 photons/cm2/s. Spectral responses were recorded from each cell before, during, and after turning on the LEDs. This intensity did not bleach photoreceptor responses, as the full response magnitude was typically recovered within seconds of turning off the LED. Photoreceptors that did not recover at least 80% of the original response were discarded.
创建时间:
2024-12-16



