Single-cell multiomics of neuronal activation reveals context-dependent genetic control of brain disorders

NIAID Data Ecosystem2026-05-10 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.0zpc8677w

下载链接

链接失效反馈

官方服务：

资源简介：

Despite hundreds of genetic risk loci identified for neuropsychiatric disorders (NPD), most causal variants/genes remain unknown. A major hurdle is that disease risk variants may act in specific biological contexts, e.g., during neuronal activation, which is difficult to study in vivo at the population level. Here, we modeled neuronal activation in human iPSC-induced excitatory and inhibitory neurons from 100 donors. Single-cell multiomics analyses of over a million neurons uncovered complex activity-dependent transcriptomic and epigenomic regulation and significantly expanded the repertoire of stimulation-specific causal variants/genes for NPD. We identified thousands of genetic variants associated with activity-dependent gene expression (i.e., eQTL) and chromatin accessibility (i.e., caQTL). These caQTL explained considerably larger proportions of NPD heritability than the eQTL. Integrating the multiomic data with GWAS revealed NPD risk variants/genes whose effects were only detected upon stimulation. Interestingly, multiple lines of evidence support a role of activity-dependent cholesterol metabolism in NPD. Our work highlights the power of cell stimulation to reveal context-dependent “hidden” genetic effects. Methods Human iPSC lines and cell culture We initially started with 107 iPSC lines (European ancestry) of which 100 lines were successfully differentiated into both excitatory and inhibitory neurons (Table S1). Of the 100 iPSC lines used for data production, 58 with their IDs starting with “CD” were reprogrammed at Rutgers University Cell and DNA Repository (RUCDR)-NIMH Stem Cell Center using the cryopreserved lymphocytes (CPLs) of donors of Molecular Genetics of Schizophrenia (MGS) cohort, and have been used in previous studies(65, 66, 78, 80, 81). The rest were purchased from the California Institute of Regenerative Medicine (CIRM). 28 iPSC lines are from SCZ cases and 72 are from healthy controls. The donors’ average age is 52 (+/-17) years old, and 56 of them are males. For TF-Knockout by CRISPR-based DNA base editing, two donor iPSC lines, CW20107 from CIRM and KOLF2.2J from Jackson Lab (JAX), were used as part of the SSPsyGene consortium(81). All iPSC lines were verified for their identify based on the matching of snRNA/ATAC-seq-inferred genotypes with their known genotypes. Cells were maintained in feeder-free mTeSR Plus (Stemcell# 100-0276) and passaged using ReLeSR (Stemcell# 100-0483) every 4-6 days following the vendor’s instructions. The Endeavor Health Institutional Review Board (IRB) approved this study. Lentivirus preparation Lentiviral particles were prepared using low-passage 293T cells maintained in DMEM media supplemented with 10% FBS. 24 hrs prior to transfection, 95% confluent 293T cells were dissociated using Accutase and replated at 1:3 ratio to achieve 70-80% confluence the next day. On the day of transfection, lentiviral plasmids including FUW-rtTA (Addgene# 20342), TetO-Ascl1-puro (Addgene# 97329), Dlx2-hygro (Addgene# 97330), pTet-O-Ngn2-puro (Addgene# 50247) were co-transfected with pMDLg/pRRE (Addgene #12251), pMD2.G (Addgene #12259) and pRSV-Rev (Addgene #12253) at 1:1:1:1 molar ratio using FuGENE HD (Promega). 18 hrs post transfection, DMEM with 10% FBS media was replaced with fresh mTeSR Plus media. 48 hrs post transfection, supernatant containing viral particles was collected and cell debris were removed by centrifugation at 500 ´ g for 5 min. Supernatant was aliquoted into low-binding tubes and stored at -80°C. Neuronal differentiation and co-culture We differentiated iPSC lines into both excitatory and inhibitory neurons and co-cultured with rat glial cells. For excitatory neuronal differentiation, we used the method for deriving Ngn2-induced glutamatergic neurons(30) with minor modifications. On DIV (days in vitro) 0, 60-80% confluent iPSCs were dissociated into single cells using Accutase and replated into Matrigel-coated 6-well plate at 5 ´ 105 cells per well in mTeSR Plus media with 5mM ROCK inhibitor, together with appropriate amount of rtTA virus and Ngn2-puro virus. On DIV1, media was refreshed with mTeSR Plus containing 5mM ROCK inhibitor and 2mg/ml doxycycline. On DIV2 to DIV4, cells were treated with neural culture media (Neuralbasal media with 1´ B27Plus and 1´ Glutamax) supplemented with 2mg/ml puromycin, 2mg/ml doxycycline to remove non-transduced cells. On DIV5 to DIV6, puromycin was withdrawn and cells were treated with neural culture media with 2mg/ml doxycycline. On DIV7, cells were ready for replating. For generating excitatory neurons, we used the method for deriving Ascl1/Dlx2-GABAergic neurons(31) with minor modifications. On DIV0, 60-80% confluent iPSCs were dissociated into single cells using Accutase and replated into Matrigel-coated 6-well plate at 7 ´ 105 cells per well in mTeSR Plus media with 5mM ROCK inhibitor together with rtTA virus, Ascl1-puro virus and Dlx2-hygro virus. On DIV1, media was refreshed with mTeSR Plus containing 5mM ROCK inhibitor and 2mg/ml doxycycline. From DIV2 to DIV4, cells were treated with neural culture media supplemented with 2mg/ml puromycin, 150mg/ml hygromycin and 2mg/ml doxycycline to remove non-transduced cells. On DIV5 to DIV6, cells were treated with neural culture media with 2mg/ml doxycycline and 2mM AraC. Cells were ready for replating on DIV7. To co-culture the excitatory and inhibitory neurons, on DIV7, separately cultured Ngn2-glutamatergic neurons, Ascl1/Dlx2-GABAergic neurons, and primary rat cortical astrocytes (Thermofisher; N7745100) were dissociated using Accutase at 37°C for 15-20 min. Ngn2-glutamatergic neurons, Ascl1/Dlx2-GABAergic neurons and astrocytes were replated at 5:5:1 ratio onto Matrigel pre-coated 12-well plate in neural culture media supplemented with 2 mg/ml doxycycline, 10 ng/ml BDNF, 10 ng/ml GDNF, 10 ng/ml NT-3, and 1% FBS. For most co-cultures, we pooled the neural cells individually differentiated from 3-5 donor lines with equal proportion. On DIV8, we refilled cells with more media. From DIV9 to DIV33, media was refreshed every three days with half volume changes. Doxycycline was withdrawn at DIV14 and 1 µM AraC was added in the media from DIV8 to DIV17 to ensure neuron purity. DIV33 neurons were used for KCI stimulation. KCI stimulation of neuronal cultures To model neuronal activation, we followed a previously used protocol to treat the cells with KCI.(28) Briefly, one day before stimulation, old media was aspirated and DIV33 neuron co-culture were silenced overnight in neural culture media supplemented with 1 mM TTX and 100mM DL-AP5. The next day, we added 0.45 volume (e.g., 0.45 ml for 1 ml original culture media) of warmed depolarization solution (10 mM HEPES, 170 mM KCl, 1 mM MgCl2, 2 mM CaCl2) to initiate KCI stimulation. We prepared three conditions with different treatment durations (0 hr, 1 hr, and 6 hrs). After stimulation, cells were dissociated for processing for 10´ Genomics sequencing library preparation. Single nuclei multiomics sequencing library preparation After stimulation, cells were briefly washed with 1´ PBS and dissociated in Accutase at 37°C for 40 min with gentle shaking. After Accutase incubation, we pipetted cells multiple times and filtered suspension twice with 40mm tip filter (Sigma# BAH136800040-50EA) to obtain single cells. The washed cells were subjected to fresh nuclei isolation following 10´ Genomics’ protocols (CG000365 and CG000338). For each library, around 15,000 freshly isolated nuclei were loaded for GEM capture targeting 10,000 nuclei recovery. The pre-amplified DNAs (for snATAC-seq) and cDNAs (for snRNA-seq) were shipped to the University of Minnesota Genomics Center (UMGC) for sn-multiomics sequencing library construction following 10´ Genomics’ standard protocol. Immunocytochemistry For Immunocytochemistry, iPSCs/neurons were fixed with 4% PFA in PBS at room temperature for 15 min. After three brief washes in PBS, cells were permeabilized with 0.5% Triton X-100 in PBS for 15 min at room temperature and further blocked with 3% BSA and 0.1% Triton X-100 in PBS at room temperature for 1 hr or 4ºC overnight. After blocking, samples were incubated with primary antibodies diluted with blocking buffer at room temperature for 1 hr or 4ºC overnight. After 3 PBS washes, samples were incubated with secondary antibodies diluted in blocking buffer at room temperature for 1 hr followed by 3 more PBS washes. Then samples were incubated in PBS containing 1 mg/ml DAPI (4', 6-diamidino-2-phenylindole) at room temperature for 10 min. After DAPI staining, samples were washed once with PBS and mounted on glass slides. The images were taken by a Nikon ECLIPSE C2 confocal microscope. BDNF OCR peak deletion by CRISPR/Cas9 editing For BDNF OCR peak deletion, two gRNAs flanking the targeted region were cloned into pSpCas9(BB)-2A-Puro (PX459) V2.0 (Addgene# 62988) respectively. Two iPSC lines were used for editing. For editing, 24 hrs prior to transfection, 60-80% confluent iPSCs were dissociated into single cells using Accutase and replated into Matrigel-coated 60 mm dish at 4.5 ´ 105 cells per well in mTeSR Plus media with 5mM ROCK inhibitor. On the day of transfection, 3 mg of plasmid DNA carrying gRNA1 and 3 mg of plasmid DNA carrying gRNA2 were introduced into iPSCs using LipofectamineSTEM (Thermofisher) following vendor’s instruction at 1:2 DNA:reagent ratio. 24-48 hrs post transfection, cells were selected with 0.5 mg/ml puromycin; 48-72 hrs post transfection, cells were selected with 0.25 mg/ml puromycin. Afterwards, antibiotics were withdrawn, and cells were maintained in regular mTeSR Plus until colonies reached appropriate size for picking. After colony picking, genomic DNAs (gDNAs) from collected cell pellets was isolated using QuickExtract DNA Extraction Solution for PCR amplification. Amplified DNAs were loaded on 1% agarose gel for electrophoresis to examine fragment size. gDNAs of the colonies with confirmed peak deletion were used for Sanger sequencing confirmation. The confirmed colonies (2-3) with the expected peak deletion were sub-cloned and expanded for cryopreservation. Please refer to Table S32 for gRNA and primer sequences. Reporter gene assay for validating ASoC SNP Reporter gene assay was performed as described(65). In brief, Plenti-PGK-V5-LUC-Neo plasmid (Addgene# 21471) was digested with XhoI and SalI to remove the PGK promoter 5’ upstream of Firefly Luciferase (LUC) gene. Afterwards we inserted a single strand oligo containing PmeI cutting site and 32-bp mini promoter through Gibson Assembly to create a new vector pLenti-PmeI-minP-LUC-PGK-Neo. 151-bp single strand oligo spanning a ASoC SNP site for each allele (75-bp 5’ upstream and 75-bp 3’ downstream) was introduced into linearized vector through Gibson assembly. Then the Lenti-virus particles were prepared as described above. To infect cells, we differentiated iPSC (lines CD19 and CD43) into Ngn2 Glut and Ascl1/Dlx2 GABA neurons separately as described above. At DIV7, GABA neurons were infected with lenti-virus containing the reporter gene and the cloned oligos for each allele. At DIV8, Ngn2 Glut, Ascl1/Dlx2 GABA neurons and primary rat astrocytes were mixed at 2:1:1 ratio and replated onto Matrigel-coated 12-well plate. At DIV30, we performed KCI stimulation for 6 hrs as described above. RNAs were isolated for qPCR quantification of the LUC reporter level of the two alleles, normalized by Neomycin expression (encoded by the Lentivirus). RNA isolation and qPCR Total RNA was isolated using RNeasy Plus Kits (QIAGEN) following vendor instructions. For qPCR, 300 ng-1 mg RNAs were reverse transcribed using High-Capacity cDNA Reverse Transcription Kit (Thermofisher) following vendor instruction. cDNAs were diluted with nuclease-free water at 1:10 ratio and qPCR reactions were prepared using Taqman Universal PCR Master Mix (Thermofisher). Reactions were loaded on Roche LightCycler 480 system in a 384-well white plate. The 2–∆∆Ct method was used for RNA expression quantification, with GAPDH as endogenous control. Please refer to Table S32 for qPCR assay information. TF knockout (KO) for functionally validating ASD gene network To functionally validate the predicted GRN of three ASD-associated TFs (TCF4, MEF2C and RORB), we used CRISPR-based cytosine base editing system to conduct TF knockout by introducing premature stop codons (iSTOP) for each TF in two donor iPSC lines, CW20107 and KOLF2.2J as part of the SSPsyGene consortium(81). After confirming the TF KO by western blot, we differentiated the engineered iPSC lines into NGN2-Glut and GABA neurons, co-cultured with astrocytes, and conducted KCI stimulation as described above. Cell cultures were then dissociated and subjected to 10´ Chromium Next GEM Single Cell 3ʹ scRNA-seq library preparation following 10´ Genomics’ standard protocol. Multiomics sequencing and data quality control (QC) 10´ Genomics Chromium single cell Multiome sequencing libraries were sequenced at UMGC on NovaSeq S4 platform targeting pair-end (2 ´ 150bp) 50K reads per nuclei for ATAC library and 25K reads per nuclei for gene expression library. Briefly, after raw data collection, Illumina’s BCL2FASTQ software was used to demultiplex and assemble the fastq files corresponding to reads (R1/R2) and indices (i5/i7) for Cell Ranger ARC. The fastq files were subsequently processed by 10´ Genomics Cell Ranger ARC (v2.0.2) and aligned twice to both the human GRCh38.p14 genome and a contingent of human GRCh38.p14 and mouse GRCm38 provided by 10´ Genomics for efficient identification and removal of rodent astrocytes. For snATAC-seq, the per-library Transcription Start Sites (TSS) enrichment score is usually 7-9, and the Fraction of high-quality Reads overlapping Peaks (FRiP) is usually 40-80%. We excluded cells with >15% of the reads mapped to mitochondrial genes, cells with > 8,000 or < 400 number of features, and cells with > 40,000 or <500 number of UMI counts (filtered by 400 < nCount < 8,000 using Seurat). Multiomic data analyses Barcode level identification of cell line identity For each library that contained cells derived from 3-4 iPSC lines, barcode-level identification was performed separately for snRNA-seq (GEX) and snATAC-seq data with genotyping information of all donors. Briefly, the BAM files generated from GEX and ATAC assays (gex_possorted_bam.bam and atac_possorted_bam.bam) were processed by demuxlet(82) with known genotype information provided as .vcf files. Barcodes (cells) identified as singlets with P1 likelihood (p1LLK) < 1 ´ 10-8 were collected, and only barcodes positively identified in both GEX and ATAC assays were retained for downstream analysis. snRNA-seq data analyses snRNA-seq (GEX) data were processed by extracting gene expression data from the .h5 files generated by Cell Ranger ARC. Only barcodes (cells) confidently assigned to individual iPSC lines were retained. With Seurat 5.1.0(83), To assign cell type identity, Leiden clustering was performed at the library level based on the first 30 PCs (or appropriate as determined by the elbow plot) and subsequently comparing cell clusters and their marker gene expressions (GAD1, GAD2, SLC17A6, SLC17A7, NEFM). The library-level gene expression matrices were subsequently collated in Seurat 5.1.0(83) as one large object with cell line-specific metadata assigned. Harmony(84) was used to integrate the library-level gene matrices and removed sequencing batch-derived effects. snATAC-seq data analyses For snATAC-seq data, the aggregation function of Cell Ranger ARC was used to merge and generate new fragment files for MACS2-based peak calling with the CallPeaks() function in Signac(85) with default settings. We made two merged fragment files, one containing all libraries from batch 024 (18 cell lines) and the other from batch 018, 024, 025, and 029 (76 cell lines). From each fragment file, we iterated through the combinations of cell types (NEFM+ glut, NEFM- glut, GABA) and stimulation stages (0 hr, 1 hr, 6 hrs), which generated nine peak sets (cell type ´ time). We further generated a union peak set from batch 024 by setting CallPeaks(combine.peaks = TRUE) for peak analysis across different cell types. Finally, peaks that fell within the ENCODE blacklisted regions were removed. We also removed peaks that fell within chromosomes X and Y and the mitochondrial genome. Approximation of developmental stages for iPSC-derived neurons We used the reference single-cell dataset of human embryonic and postmortem brains(34) to evaluate our iPSC-derived neurons and their comparable developmental stages. Briefly, the excitatory neuron (ExNeu) and interneuron (IN) populations were extracted using the identity assigned in the original publications. The excised data were then processed as documented in the original publication with Harmony-assisted data normalization, scaling, and dimension reduction using the first 30 PCs to generate the anchor gene set, UMAP dimensions, and unimodal UMAP coordinates for projection. The three main cell types identified (npglut, nmglut, GABA) from snRNA-seq were used as queries. Each cell type was firstly randomly subset to 10,000 cells to reduce computational demands. Subsequently, each cell subset was separately projected by the MapQuery() function to its corresponding cell map (npglut/nmglut to ExNeu and GABA to IN) to show their approximate developmental stages in the human brain. PCA analysis For PCA analysis, we generated pseudo-bulk count matrices from merged snRNA-seq data using the AggregatedExpression() function from Seurat, and each sample represented one cell line in one of three main cell types and its stimulation stage. To further eliminate the interference from sequencing batch-specific effects, we ran regression using the ComBat-seq() function from the R package sva(86) using sequencing batch information as the factor. The sva-adjusted pseudo-bulk count matrices were log-transformed and estimated for their observation-level weights using the voom() function from the R package limma(87). Finally, we calculated the principal components (PCs) with the prcomp() function and plotted the samples using their first two PCs using the fviz_pca_ind() function from the R package factoextra (https://cran.r-project.org/package=factoextra). Pseudo-bulk RNA-seq differential gene expression analysis Gene differential expression (DE) analysis was performed using the R package limma. Briefly, the ComBat-seq-adjusted pseudo-bulk count matrices generated from PCA analysis were used as the initial input. Low-expression genes whose Count Per Million (CPM) value was less than < 1 in half of lines in any of the 3 time points were removed. The matrices were log-transformed and observation-level weights were calculated by voom(). Subsequently, we made the design matrix of known covariates (time, batch, age, sex, SCZ-affection status, and the percentage of corresponding cell types). We fitted the linear model using the lmFit() function of limma. After checking the mean-variance trend (SA plot) for gene distributions, the DE values (log2FC, p-value, and FDR) were calculated using the topTable() function with corresponding coefficients (1 hr vs 0 hr, 6 hrs vs 0 hr, 6 hrs vs. 1 hr) for all three cell types. MAGMA gene set enrichment analysis We performed MAGMA analysis using MAGMA version 1.08b(88) to evaluate the enrichment for the GWAS risk of several psychiatric disorders (SCZ, Neuroticism, ASD, BP, and MDD)(4, 6, 7, 69, 89, 90), as well as the GWAS set of Crohn’s disease that served as a control set(91) (Table S33). The method was adapted from our previous publications with modifications(66). Specifically, we compiled the MAGMA-required gene annotation data files based on the more recent GRCh38/hg38 genome to get the gene-SNP annotation file. With the gene-SNP annotation file, we then performed gene-level analysis on SNP p-values using the reference SNP data of 1,000 Genomes European panel (g1000_eur_hg38, --bfile) and the pre-computed SNP p-values from each disorder’s GWAS data set. The sample size (ncol=) was derived from either GWAS data frames or specified according to the affiliated README data. Subsequently, the result files (--gene-results) from the gene-level analysis were read in for competitive gene-set analysis (--set-annot), where we used default setting (correct=all) to control for gene sizes in the number of SNPs and the gene density (a measure of within-gene L.). The gene-set analysis produced the output files (.gsa.out) with competitive gene-set analysis results that contained the effect size (BETA) and the statistical significance of the enrichment of each gene set (upregulated/downregulated) for each disorder’s GWAS data set. Gene ontology enrichment analysis GO term analysis was performed for the DEG sets specific to each cell type and expression pattern. Briefly, the list of DEGs was used as the SynGo(92) input and all the expressed genes in the corresponding cell type were used as the background gene list. A similar approach was used to calculate the enrichment of different gene expression patterns against a multitude of known NPD gene sets. For cTWAS gene sets, the enrichment analysis was conducted using the Enrichr package(93-95) with the default background gene set, and Benjamini-Hochberg method was used for multiple testing correction. Differential peak accessibility analysis DA peak accessibility was performed using aggregated pseudo-bulk counts based on the three cell-type-specific peak intervals (nmglut, npglut, GABA). Briefly, the AggregatedExpression() function was used to generate cell line ´ peaks matrices using the cell-type-specific peak intervals (approximately 300K peaks per cell type). The ComBat_seq() function from sva was subsequently applied to correct group bias for each count matrix. We removed all low-count peaks (CPM < 1 in at least half of the cell lines), and approximately 200K of the original 300K peaks survived filtration. The matrices were log-transformed and observation-level weights were calculated by voom(). Subsequently, we applied design matrices of known covariates (time, batch, age, sex, SCZ-affection status, and the percentage of corresponding cell types) as we did in DE gene analysis. We fitted the linear model using the lmFit() function of limma. The DA values (log2FC, p-value, and FDR) of peaks for each cell type and contrast were calculated using the topTable() function with corresponding coefficients (1 hr vs 0 hr, 6 hrs vs 0 hr). Stratified linkage disequilibrium score regression (sLDSC) for GWAS enrichment analysis sLDSC(96) analysis was performed by using the hg38 version of European genotype data (SNPs) from 1,000 Genomes Phase 3 and v2.2 baseline linkage disequilibrium/weights. Briefly, linkage disequilibrium score estimations were pre-calculated from the hg38 version of the 1,000 Genomes EUR file set (w_hm3_no_hla.snplist), window size 1 cM (ld-wind-cm 1). We used the summary statistics of major psychiatric disorders and non-psychiatric diseases for partition heritability, with several data sets lifted over from hg19 to hg38 when necessary. Disease-specific regressions were performed independently using hm3 SNP weights against each disease for cell-type-specific analysis. Integration of snATAC-Seq Profiles and Peak Calling for caQTL and gene regulatory network analyses To integrate the snATAC-seq data, we utilized ArchR. Quality control measures were applied by filtering cells with a Transcription Start Site (TSS) enrichment score of at least 4 and fragment counts of at least 1,000. After quality control and integration of expression data, we retained a total of 552,653 cells. To call peaks from the snATAC-Seq data, pseudo-bulk replicates were generated for each cell type and for each time point. To ensure equal representation of all cell types, we followed the default setting of ArchR and sampled a maximum of 500 cells and 25 million fragments from each pseudo-bulk replicate. With the pseudo-bulk replicates, 501-bp fixed-width peaks were called using MACS2(97) and integrated in ArchR. Peaks were required to be reproducible in at least two samples, and sex chromosomes were excluded in the analysis. To ensure high-quality peak detection, we limited the number of peaks to a maximum of 250,000 for each cell type and time point context. Consensus peaks were obtained through an iterative overlap peak merging approach in ArchR. This method allowed for robust and reproducible peak identification across different samples and conditions. Single-cell Embeddings To reduce computational cost, we down-sampled 18 cell lines (115,855 cells in total) from Batch 024 for all analyses related to gene regulation, including dimensionality reduction, chromVAR analysis, pseudotime inference, peak-gene mapping, and gene regulatory network construction. Using ArchR, we performed dimensionality reduction on snRNA-Seq and snATAC-Seq data separately, utilizing the iterative latent semantic indexing approach. To create a more robust two-dimensional representation using UMAP, we combined the embeddings of expression and peak accessibility, subsequently correcting for batch effects using Harmony. Inference of Pseudotime Pseudotime trajectories were estimated for the GABA, nmglut, and npglut cell types separately to model the transitions between neuron states from unstimulated neurons to neurons stimulated for 1 hr and 6 hrs. Based on the UMAP constructed in the previous section, cell-type-specific trajectories were inferred using the addTrajectory() function in ArchR. We fine-tuned the sparsity of the spline fit and quality filters for the trajectory of each cell type to avoid overfitting of smoothing spline. Cells were divided into 100 bins based on pseudotime, and the average activities were calculated by pooling cells with similar pseudotime values. Co-activation analysis to link OCRs to genes To link OCRs to potential target genes, we assessed the correlation of gene expression and normalized chromatin accessibility for all ORCs within 500 kb of genes. All differentially expressed genes were included in the analysis. For each cell type, peaks present in at least 5% of the cells were considered. Gene expression was regressed on peak accessibility using a negative binomial mixed regression implemented using lme4 R package (DOI: https://doi.org/10.18637/jss.v067.i01) with the following formula: where: - represents the raw counts of gene expression for gene , - denotes the normalized chromatin accessibility of peak , - is a categorical variable representing cell line identity, included as a random intercept to account for donor-specific effects, - refers to the single-cell level library size from the snRNA-Seq data, captured by the number of reads per cell, - captures the single-cell level percentage of mitochondrial reads from the snRNA-Seq experiment, serving as a measure control for cell quality. To control for multiple testing, the Benjamini–Hochberg procedure was applied to calculate the false discovery rate (FDR). OCR-gene pairs were considered significant and biologically relevant if they had an FDR ≤ 0.05 and a positive regression coefficient. Activity-by-Contact (ABC) analysis We employed the Activity-by-Contact (ABC) model(45, 98) to predict enhancer-gene regulatory relationships. For our analysis, we utilized snATAC-Seq data from all 18 samples in batch 024. Additionally, we incorporated time-specific bulk Micro-C data derived from one cell line (CD-11), averaged across all cell types, which allowed us to capture the dynamic nature of chromatin interactions during different neuronal stages. Our analysis aimed to predict enhancer-gene pairs across nine distinct contexts, defined by different combinations of time and cell type. To achieve this, we called peaks for each specific context and subsequently predicted ABC scores under these conditions. An enhancer-gene pair was classified as significant if its ABC score was greater than or equal to 0.021, a threshold recommended by the software to achieve 70% recall. Gene Module Analysis To cluster genes based on their expression trajectory in pseudotime, we first selected 5,221 differentially expressed and highly variable genes, using criteria of FDR ≤ 0.05 and abs (Log2FC) > 1 in at least one differential expression test. We then normalized expression for each gene separately, so that genes with similar trajectory patterns would be clustered together, irrespective of their absolute expression levels. We then performed K-means clustering to group the genes into 15 distinct modules. Briefly, similarity in expression patterns was defined based on the scaled and normalized expression profiles of genes across 100 pseudotime points for each of the three cell types, resulting in a total of 300 data points per gene. K-means clustering was applied to group genes with similar temporal expression trajectories. The optimal number of clusters (modules) was set to 15, which provided sufficient resolution to capture distinct transcriptional dynamics while minimizing redundancy among modules with overlapping expression patterns. Gene Ontology enrichment analysis was conducted using the Enrichr package(93-95), and GWAS enrichment analysis was performed in MAGMA. We inferred chromatin accessibility trajectories for the 15 gene modules by mapping all peaks to the genes within each module, as described in the previous section. Defining candidate TFs regulating early and late responses We used chromVAR to estimate single-cell level enrichment of TF activity. With motif annotations curated from Cis-BP, we estimated chromVAR motif deviations implemented in the ArchR toolkit. We also had single-cell expression from scRNA-seq for all TFs. For each TF, we assessed its role in early and late responses by testing differential motif activity and expression between conditions, one cell type at a time. Specifically, a TF was considered a candidate TF for early response if it showed both higher motif activity and increased expression at 1 hr compared to 0 hr. Similarly, a TF was considered a candidate TF for late response if it showed higher motif activity and elevated expression at 6 hrs compared to both 1 hr and 0 hr. Differential expression analysis was performed using the limma-voom analysis, and differential motif analysis was performed using one-tail Wilcoxon signed-rank test using single-cell level motif deviations estimated by chromVAR. Gene Regulatory Network (GRN) Analysis To construct the gene regulatory network, we followed these steps: (1) Identify Potential TF Regulators: We identified a set of potential TF regulators by selecting TFs that were differentially expressed in at least one condition. We also ensured that a TF’s motif activity (estimated by chromVAR) and its expression activity were positively correlated. Specifically, we filtered TFs based on the correlation between their expression trajectories and motif activity trajectories, retaining only those with a Spearman’s correlation > 0.3. (2) Define candidate TFs of a gene: we linked Open Chromatin Regions (OCRs) to a gene using co-activation analysis, with FDR < 0.1; and then examined the presence of the TF motif (P-value cutoff of 5 ´ 10-5) in these OCRs linked to the gene, using motifmatchr in ArchR. (3) Correlate TF Motif Activity with Target Gene Expression: For motifs present in at least one peak associated with a gene, we correlated the motif activity of TFs with the expression of the target genes. We retained the TF-gene pairs that exhibited a Spearman’s correlation > 0.5 between the motif activity of the TF and the expression of the target gene. To infer TFs that may play roles in ASD (Fig. 2I), we used Fisher’s exact test to evaluate the enrichment of ASD risk genes in each TF’s predicted targets. The background genes of this test are all genes included in GRN reconstruction, i.e., those expressed in at least 5% of cells. FDR was used to correct for multiple testing. eQTL mapping To associate the genotypic variation and gene expression variation within each context (combination of cell types and time points), we used the scRNA read count matrix of 36,601 features and 548,800 cells from 100 cell lines and three cell types (i.e., GABA, nmglut, npglut). We first generated pseudo-bulk matrices by summing up the read counts by each feature for cells within a certain time point, a certain cell type, and a certain cell line. We removed samples with less than 1 million total reads. We also removed genes that were not protein-coding, in mitochondria, with less than 2,000 reads across all cells, or without HGNC symbol. After filtering, we retained the pseudo-bulk read count matrix of 14,818 genes by 824 samples (out of 900). Then, we performed trimmed mean of M values (TMM) normalization and inverse Normal Transformation. The resulting gene expression matrix served as input for MatrixeQTL, along with covariates of cell type composition, sex, age, SCZ affection status, 5 genotype PCs, and 15 gene expression PCs. Genotypes for iPSC donors were downloaded from dbGaP (phs000021.v3.p1 and phs000021.v3.p2 for MGS lines and phs002032.v1.p1 for CIRM lines). For each cell line source, genotype data were processed on the TOPMed Imputation Server using a standardized pipeline that included liftover to GRCh38, quality control, phasing, and genotype imputation. Genotypes were phased with Eagle v2.4 and subsequently imputed with Minimac4 against the TOPMed all-population reference panel (GRCh38). Post-imputation variants were filtered on any missing values, imputation quality R2 < 0.3 (default setting), and MAF < 5%. Then we used the intersection of imputed SNPs from both sources (n = 5,966,820 SNPs). Additional 3,377 SNPs were then removed due to HWE filter (P < 1 ´ 10-6). We used cis-variants 250 kb up- and downstream of the gene body. This cis window size and the number of gene expression PCs were chosen to optimize the number of eGenes (i.e., genes with at least one significant eQTL) discovered. We adopted the default procedure to define significant eQTLs in MatrixeQTL(99). In brief, we computed the t statistic of the genetic effect and derived the nominal P value for each gene-SNP pair. FDR was used to correct for multiple testing across all gene-SNP pairs. To assess if this procedure adequately controls for multiple testing, we shuffled the genotype across donors and applied MatrixeQTL procedure for each context in the shuffled data. We found less than two eGenes from the analysis, confirming that the procedure effectively controlled for false discovery rates. In addition to the eQTL mapping in each context, we also performed eQTL mapping by aggregating all contexts. We removed the 5 cell lines from batch 22 and summed up the read counts per cell line per contexts. TPM normalization was applied to the summed reads and then we further summed up all contexts within the same cell line. We proceeded to eQTL mapping with the resulting 95 samples (cell lines) by applying TMM, INT, and MatrixeQTL. This is the same procedure as described above, except that the number of expression PCs was 21 instead of 15. eQTL results from this approach are called “pseudo-bulk eQTL” in subsequent sections. Dynamic eQTL testing We tested the genetic effect heterogeneity across 3 times points, one cell type at a time. We used the following procedure to choose the candidate set of gene-SNP pairs to be tested for each cell type. We considered only eGenes found in at least one time point, at a relaxed threshold FDR < 0.2. For each of these eGenes, we started with its eQTL with the smallest nominal P values per condition. Then we took a union of these top eQTL along with the pseudo-bulk eQTL. There are at most four eQTL for each eGene in a cell type. We then performed LD pruning with r2 < 0.1 on these eQTL within the same eGene. The pseudo-bulk gene expression used for testing dynamic eQTL were jointly normalized within each cell type. Specifically, we applied inverse Normal transformation across all the pseudo-bulk samples within the same cell type. PCA was applied to the normalized gene expression of each cell type. The first 15 gene expression PCs were used as covariates in the test below. Our interaction effect model compares the genetic effect in one stimulation context with the resting context in the same cell type. In total we have six different sets of comparison (GABA 0 hr vs. GABA 1 hr, GABA 0 hr vs. GABA 6 hrs, NMglut 0 hr vs. NMglut 1 hr, NMglut 0 hr vs. NMglut 6 hrs, NPglut 0 hr vs. NPglut 1 hr, NPglut 0 hr vs.NPglut 6 hrs). For each comparison, we selected the gene-SNP pairs with FDR < 0.2 from eQTL mapping, and run the interaction effect model as following: Gene expression ~ Age + Sex + Disease + cell type proportion + Genotype + Time Point + Genotype x Time Point + 5 Genotype PCs + 15 Gene expression PCs P values were derived from the Student’s t-test statistic of the interaction term of “Genotype x time point”. We used ACAT(100) to aggregate the P values of SNPs from the same gene. We then adjusted multiple testing using Benjamini-Hochberg method. To compare dynamic eQTL vs. static eQTL in terms of cell type sharing and comparison with GTEx (Fig. 3EF), we classified an eGene by whether it is also a dynamic eGene from the interaction testing. All other eGenes were considered static eGenes. To test if dynamic eQTL tend to be associated with certain epigenomic features, compared to other (static) eQTL (Fig. 3G), we directly used the summary statistics of the interaction tests. Specifically, we tested if the dynamic eQTL are enriched in up-regulated OCRs using the tool TORUS(65, 101). Roughly speaking, TORUS assesses if variants with strong associations (i.e. dynamic eQTL) tend to be located in upregulated peaks in stimulation in the matching cell type, compared to variants in the background (those with low associations). Importantly, since only eQTL are used in this test, the variants in the background are actually all eQTL, with the majority of them being static eQTL. Comparison of neuronal eQTLs with published eQTL datasets We estimated the proportion of shared eQTLs between two datasets using Pi1 analysis. For effect size correlation analysis, we used the Rb estimator(102). This estimator is distinct from the Spearman or Pearson correlation approach because the latter does not account for estimation errors in the eQTL effects and thus underestimates the correlation of true eQTL effects(102). TORUS enrichment analyses of eQTL and ASoC variants We applied a Bayesian hierarchical model (TORUS) to perform SNP-based enrichment analysis, testing the enrichment of certain features with eQTL and ASoC variants(65, 101). Briefly, TORUS associates the prior probability that a variant is causal to gene expression with the annotations of the variant with a logistic regression model. TORUS then uses this prior in a simple fine-mapping model (assuming a single causal variant per genetic locus) of eQTL data. TORUS uses Maximum Likelihood to estimate the parameters of the logistic regression. To assess the enrichment of ASoC in eQTL, we applied the ASoC in matched condition as the neuronal activity eQTL. ASoC SNPs were treated as binary annotations: 1 being within the significant ASoC and 0 otherwise. We used SNPs that did not pass the ASoC testing (FDR>5%) for comparison. For testing disease GWAS enrichment, TORUS assumes that every variant is a risk variant or not, represented by a binary indicator variable (1 or 0). The prior probability of the indicator of a SNP being 1 depends on its annotations. Here, we tested ASoC SNPs categorised by their cell type ´ time against a selection of diseases/traits GWAS databases. All the annotations are encoded as binary (1 if an SNP is found in the corresponding GWAS database, 0 otherwise). We performed univariate analysis in TORUS to assess the enrichment of each ASoC cell type ´ time combinations. The GWAS datasets used for enrichment/TORUS analysis were from multiple sources, including NPDs, neuro-related traits, and control disorders/traits, as listed in Table S33. Analysis of genetic effect sharing between neuronal activity eQTL and GTEx and ASoC. We used Storey’s analysis to analyze the sharing between our eQTL and other summary statistics. This is a common approach to investigate sharing of eQTL under different cell types. For each context in our eQTL and each tissue in GTEx brain eQTL, we obtained all eQTL (FDR<0.05) in a context and estimated the proportion of non-null tests ( , Pi1) based on the binomial p values of these gene-SNP pairs in a GTEx brain tissue. A similar approach was taken when we computed for our dynamic eQTL in GTEx brain tissues. We also used Storey’s analysis to investigate the sharing between our eQTL and ASoCs. For each ASoC SNP with FDR<0.05, we located its nearest gene and obtained the nominal P values of the corresponding eQTL in the matched context if possible. Then, we estimated non-null proportions of these P values. caQTL Mapping We calculated pseudo-bulk counts from ATAC-seq for 94 GABAergic neuron cell lines and 95 Glutamatergic cell lines under each time point. The peak insertion counts were initially normalized for library size using the trimmed mean of M values (TMM) method. Subsequently, we applied a rank-based inverse normal transformation (INT) to standardize the accessibility of each peak to a standard normal distribution, utilizing the RNOmni package(103). These counts served as the input for caQTL analysis. We employed TensorQTL(104) for caQTL mapping. To adjust for batch effects and confounding factors, we included three genotype principal components and between four to nine chromatin accessibility PCs as covariates in our analysis: where denotes the normalized chromatin accessibility of peak , genotype of a cis-SNP , are genotype principle components, and are the chromatin accessibility principle components, For each peak, we conducted 10,000 permutations and computed beta-approximated P-values of the top QTL. To account for multiple testing, we calculated Q-values using the qvalue package. Peaks with a Q-value ≤ 0.05 were considered significant and designated as cPeaks (caQTL peaks). This threshold controlled the FDR and ensured the reliability of our findings. To investigate time-dependent QTL effects for each cell type, we included the union of cPeaks from all time points. This inclusion ensured that all cPeaks from different time points were considered. For each cPeak, we identified and tested the SNP from the strongest QTL across all time points. This SNP served as the representative variant for assessing QTL effects. We first log-normalized the counts using the NormalizeData() function in Seurat, then applied rank-based INT to each peak. To test time-dependent effects, we modified the caQTL test above, including Time, and Genotype-Time interaction term. We included 3 genotype PCs, 5 chromatin accessibility PCs, and their interaction effects with time as covariates and ran linear regression as: where is a categorical variable of the duration of KCl stimulation. We employed an ANOVA test to compare the full model (including the time-genotype interaction effects) with a reduced model (excluding the time-dependent effect). This allowed us to assess the significance of the time-dependent effect on QTL. FDR correction was applied to adjust for multiple comparisons, ensuring the robustness of our results. To validate the test statistics and avoid inflation, we performed 1,000 permutations. This permutation approach verified that the observed test statistics were not artificially inflated, thereby enhancing the reliability of our time-dependent QTL effect analysis. ASoC mapping We applied an improved, two-step ASoC mapping with WASP to calibrate for alternate-allele bias effect and to gain more accurate results. BAM files generated from Cell Ranger ARC (atac_possorted_bam.bam) were firstly split and re-merged by cell line using barcodes from initial demultiplexing. Each line-specific BAM file included aligned reads of all cell types and three stimulation stages. GATK (4.2.6.0) HaplotypeCaller was applied to each BAM file (cell line), which identified and extracted all heterozygotic SNP sites and homozygotic SNP sites carrying alternative alleles. The extracted SNP coordination was subsequently used to generate the positional reference of WASP. Next, the original BAM files from Cell Ranger ARC were re-split at the cell line ´ cell type ´ stimulation stage (time) levels to generate raw BAM files for WASP calibration. For a set of BAM files from the same line, WASP calibration was performed at the individual BAM files using the line-specific SNP heterozygosity information to re-align reads carrying alternative alleles with bwa and remove any bad read pairs. A second round of GATK HaplotypeCaller was used to identify heterozygous SNP sites and their counts in each calibrated BAM file to generate corresponding VCF files as recommended by the GATK Best Practice (software.broadinstitute.org/Gatk/best-practices/) with VariantRecalibrator (-an DP -an Q.D. -an F.S. -an SOR -an M.Q. -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.5 -tranche 95.0 -tranche 90.0) and reference databases including HapMap v3.3 (priority = 15), 1000G_omni v2.5 (priority = 12), Broad Institute 1000G high confidence SNP list phase 1 (priority = 10), Mills 1000G golden standard INDEL list (priority = 12), and dbSNP v154 (priority = 2). Heterozygous sites with tranche level >95.0% were extracted. And only SNPs with corresponding rs# records found in dbSNP v154 were retained. The individual VCF files were subsequently merged at the cell type ´ stimulation stage level to increase the statistical power (each merged output included 96-100 lines as cell type varied). After merging, Biallelic SNP sites (GT: 0/1) with minimum read depth count (DP) ≥ 30 and minimum reference or alternative allele count ≥ 2 were retained. The SNPs were further filtered by total read depth of 10 per sample called as heterozygous. The binomial p-values (non-hyperbolic) were calculated using the binom.test(x, n, P = 0.5, alternative = “two.sided”, conf.level = 0.95) from the R package, and Benjamini & Hochberg correction was applied to all qualified SNPs as the correcting factor of R function p.adjust(x, method = "fdr"). We set the threshold of ASoC SNP at FDR value = 0.05. Homer enrichment analysis of TF-binding motifs for ASoC SNPs We used HOMER(105) to assess the enrichment of TF binding motifs in sequences flanking ASoC SNPs (±25 bp) adapted from our previous investigations(65). We used the JASPAR database (2022 release) with all 522 human TF motifs. For each cell type ´ time, we used the ASoC SNPs (FDR < 0.05) as input intervals. The parameters used were findMotifsGenome.pl <input.bed> hg38 -cpg -mknown jasper2018.known . HOMER provided the genetic background for the enrichment test, and the enrichment significance was derived from HOMER. Brain eQTL enrichment analysis for ASoC SNPs We performed eQTL enrichment analysis to investigate the differences in the association between ASoC and non-ASoC SNPs in GTEx cortex-eQTL(106) and PsychENCODE eQTL(107) databases with adaptation(65). For ASoC/non-ASoC SNPs in each set of cell type ´ time combination, the SNPs were assigned to genes (eQTL targets) if the distance of the SNP and its corresponding gene TSS was less than 500 kb, and the SNP was associated with the expression of the corresponding gene at FDR < 0.05. Micro-C analysis of ASoC SNPs and their targets The Micro-C experiment was conducted in co-cultured Ngn2-Glut and GABA neurons of two donor lines (CD11, CD12) at 0 hr, 1 hr, and 6 hrs post-stimulation. Dovetail™ Micro-C (Pan-promoter Capture) kit was used process the samples. To minimize background interference from cell-line discrepancies, we used the samples derived from a single line (CD11) from comparing different time points. The Micro-C data was performed by an external vendor and analysed using ChiCAGO(108) at 500 bp resolution. To assign the micro-C targets of each SNP, we firstly associated each SNP (0 hr, 1 hr, 6 hrs) with its nearest interacting micro-C fragment (from 0 hr, 1 hr, 6 hrs samples) if their distance was less than 2.5 kb (upstream/downstream). If such an association could be established, we extracted the other end of the interacting micro-C fragment and checked if it overlapped with the promoter region (-2 kb / +1 kb of TSS). If the other fragment fell into the promoter region, we added the original SNP to the pool of ASoC SNPs with their corresponding identity. Promoter and enhancer enrichment analysis for ASoC SNPs We defined the interactions of ASoC SNPs and human enhancers/promoters according to our previously developed algorithms with adaptation(65). The analysis was similar to that implemented in GREAT and was described previously(68, 109, 110) for testing the folds of peak enrichment within the annotated genomic regions and epigenetically annotated regions. To adapt the original algorithm for SNP-based testing, we developed an updated version of the algorithm, and the enrichments were calculated using the formula below: For ASoC SNPs that were cell-type-specific or shared by three cell types, we analyzed enrichment considering SNP intervals as the unit similar to OCR peaks. The enrichment p-values were estimated based on the binomial test as implemented in GREAT and as previously described(109, 110). from which we calculated the p-value using the binom.test(x, n, p, alternative=“greater”) function in R, where x = numbers of SNPs that fell within the designated epigenetically annotated features; n = total length of the SNP intervals (in bp); p = total length of the designated epigenetically annotated features (in bp) / genome size (3.2 ´ 109 bp) The Gene-based annotation of the genome was derived from GENCODE v35 as part of the built-in database of the updated HOMER package(105). The list of human forebrain/non-forebrain enhancers was from the VISTA enhancer browser(111). Human-gained enhancers were acquired from GSE63648 using data from either the frontal or occipital cortex at 12 PCW and marked by H3K27ac or H3K4Me2(112). The definitions of chromatin state were assembled using an imputed 25-state model derived from individual #E081 of fetal brain tissue by the Roadmap Epigenomics Project(110, 113). For estimating the enrichment of ASoC SNPs (Fig. 2A), the categories of epigenomic features used for promoter and enhancer annotations were promoter = TssA, PromU, PromD1 and PromD2; enhancers = TxReg, TxEnh5, TxEnh3, TxEnhW, EnhA1, EnhA2, EnhAF, EnhW1, EnhW2, EnhAc, and DNase. The abbreviation of different chromatin states was promulgated below, as used in(110): TssA : Active TSS PromU : Promoter Upstream TSS PromD1 : Promoter Downstream TSS with DNase PromD2 : Promoter Downstream TSS Tx5’ : Transcription 5’ Tx : Transcription Tx3’ : Transcription 3’ TxWk : Weak transcription TxReg : Transcription Regulatory TxEnh5’ : Transcription 5’ Enhancer TxEnh3’ : Transcription 3’ Enhancer TxEnhW : Transcription Weak Enhancer EnhA1 : Active Enhancer 1 EnhA2 : Active Enhancer 2 EnhAF : Active Enhancer Flank EnhW1 : Weak Enhancer 1 EnhW2 : Weak Enhancer 2 EnhAc : Enhancer Acetylation Only Dnase : DNase only ZNF/Rpts : ZNF genes & repeats Het : Heterochromatin PromP : Poised Promoter PromBiv : Bivalent Promoter ReprPC : Repressed PolyComb Quies : Quiescent Forebrain_enh : Forebrain enhancers Non_forebrain_enh : Non-forebrain enhancers H3K27acF : H3K27 acetylated regions found in the frontal lobe H3K27acO : H3K27 acetylated regions found in the occipital lobe H3K4me2F : H3K4me2 regions found in the frontal lobe H3K4me2O : H3K4me2 regions found in the occipital lobe Integrative analysis of NPD GWAS with neuronal activity eQTL or caQTL Causal TWAS (cTWAS)(59) was used to analyze the heritability of NPD mediated by the molecular traits from our QTLs and ASoCs, and to discover putatively causal genes (eQTL-based cTWAS) and cPeaks (caQTL-based cTWAS) for NPD phenotypes. cTWAS is a generalization of the standard TWAS. In standard TWAS, one first trains the prediction models of gene expression traits from eQTL data, and then correlates the predicted expression traits, one at a time, with the phenotype in a GWAS cohort. However, due to confounding by nearby variants and other expression traits, TWAS has high false positive rates(59). In cTWAS, we control the confounding by fitting a joint regression model of the phenotype against all gene expression traits and all variants in the same region. Let be the genotype of the m-th variant, and be the expression of the j-th gene. Assuming the prediction models of gene expression are given, we denote as the genetically predicted expression of . Our model is: where βj and θm are the effect sizes of gene expression j and the variant m, respectively. cTWAS assumes a spike-and-slab prior for the effects of gene expression and variants, respectively. The parameters of these distributions are estimated through an expectation-maximization (EM) algorithm, using all the data from the genome. Once these parameters are estimated, cTWAS solves the regression model above with SuSiE(114), a fine-mapping method. The results would be Posterior Inclusion Probabilities (PIPs) for all the expression traits and variants. While the model is formulated using individual level data, the method can be used with summary statistics. In that case, a reference LD matrix matching the GWAS samples is used. As a Bayesian method, cTWAS does not need to correct for multiple testing as traditionally done by frequentist methods. Intuitively, the PIPs have already included the prior information: the low prior probability of a molecular trait or variant having a causal effect effectively controls for multiple testing. The cTWAS model above has only gene expression in a single context. But it is easy to extend the model to allow multiple groups of QTL in the same model, with exactly the same algorithm. For example, we could have eQTL from multiple tissues or cellular contexts; or one eQTL group and one caQTL group, in the model. We note that different groups generally have different prior parameters, most importantly, the proportion of molecular traits having effects on the phenotype in each group. These parameters are estimated by cTWAS. This allows us to favor certain groups of molecular traits (those with higher prior) in the fine-mapping analysis. We used this version of cTWAS in our study. This integration can combine evidence across multiple groups to improve power. For example, if we have eQTL across several contexts in the model, then we can obtain the PIP of a gene in each of the contexts. These PIPs can be combined to obtain a single summary of how likely a gene is causal, denoted as “gene PIP”. In our study, the PIP of a gene is simply the sum of PIPs of all expression traits of this gene across all contexts. This gene PIP approximates the probability that the gene is causal in at least one context. Meanwhile, we can also quantify the importance of each context, defined as the ratio of PIP of that context over the total PIP across all contexts. This allows us to assign causal context for a putative causal gene. When we used cTWAS to integrate eQTL with NPD traits, we first extract top eQTL per gene per context and use these eQTL as the genetic prediction models (denoted as “weights”) of the gene expression traits. cTWAS then includes eQTL data across all 9 contexts in analysis. We note that even though we analyzed all 9 contexts together, a particular gene may occur only in a subset of 9 contexts, as it may not have eQTL in all contexts. However, one problem is that our eQTL study has incomplete power, thus for a particular gene, we may miss some relevant contexts. To address this issue, we used a relaxed threshold. Suppose a gene has eQTL in one context (say C1), but no eQTL in the second context (C2). If the top eQTL in C1 has p value < 0.1 in C2, then this eQTL would be included as the weight in C2. This allows us to include as many contexts as possible for a single gene. To run cTWAS, we used the default setting with the prior variance shared by types of QTL (i.e. QTL across all 9 contexts would have the same variance) and 10% down-sampled SNPs used for estimating parameters. We allowed at most 5 causal signals per locus. We used the LD reference computed from 10,000 White British samples from UK Biobank(59). In characterizing the high confidence genes of cTWAS, we identified the contexts that support the findings. If the total PIP from the stimulated states (1 hr and 6 hrs) was greater than the PIP from 0 hr by at least 0.5, we denoted the gene as “dynamic”; and “static” otherwise. We used the same procedure when applying cTWAS to caQTL and ASoC variants. Here the units for analysis are not gene expression traits, but chromatin accessibility of OCRs (i.e., cPeaks). We included all the OCRs with either caQTL or ASoC SNPs. For each OCR, we used either the top caQTL or the associated ASoC SNP as the weights. In the cases where a peak is associated with both caQTL and ASoC, we chose the variant with smaller nominal P values. In addition, we used P < 0.05 to determine what contexts to include in cTWAS; for example, for a SNP that is a caQTL or ASoC in one context (under the FDR cutoff), if the SNP has P < 0.05 for the same peak in another context, then the second context would also be included in cTWAS. Single-cell DEG analysis in SCZ and TF-KO neurons We performed DEG analysis between neurons from SCZ donors and those from controls to identify genes that exhibited SCZ case-control differential neuronal activation. Briefly, we first subset the snRNA-seq data of the 28 SCZ lines and the sex/age-matched 28 controls (17 males and 11 females in SCZ cases with an average age of 48; 16 males and 12 females in controls with an average age of 50). The subset Seurat object was first split by individual ´ cell type and processed by standard Seurat SCTransform for data transformation. Subsequently, all transformed libraries were re-integrated by Seurat IntegrateLayers using “HarmonyIntegration” method to construct the master Seurat object for MAST analysis using the zlm hurdle model. We used covariates including cell co-culture batch, sex, age, and cell type fraction in MAST test. The Seurat wrapper FindMarkers was used to find DE genes between SCZ case and control groups. Only genes expressed in at least 5% of cells in either comparison group were used. We recalculated FDR for the result of each context (cell type/ timepoint). FDR < 0.05 was used as cut-off for DEGs used for the gene set enrichment analyses. For scRNA-seq of TF-KO neurons, we followed similar data processing and QC procedure as descried above. For DEG analysis of each TF-KO, we subset 630 cells (the minimum number) for each cell type/time/cell line and used MAST in Seurat as described above with cell line as a covariate to control for random effect. Statistical analyses For genetic analyses (eQTL and caQTL mapping and GWAS enrichment analyses), different statistical methods were used and specified in the method section above and in different figure legends. For other assays, unless otherwise specified, Student’s t-test (between two groups) or the Kruskal Wallis test with Dunn’s multiple comparisons and P-value adjustment (more than two groups) was used to determining significance between groups. Samples were assumed to be unpaired and have non-parametric distribution unless otherwise specified. Data were analyzed using R 4.3.2 and GraphPad Prism 10. Results were considered as significant if P < 0.05 (*: P < 0.05; **: P < 0.01; ***: P < 0.001; ****: P < 0.0001). All data are reported as mean ± SEM. Supplemental Text Sn-Multiomics data QC and preliminary analysis The sequencing reads of 1,053,422 nuclei (Tables S1-2) were aligned to a hybrid human/rat genome using 10´ Genomics Cell Ranger ARC (v2.0.2) and analyzed in Seurat 5.1.0(83). We performed data quality control (QC) by sequencing batch and after merging all batches (Fig. S1A). For each batch, we demultiplexed(82) different donor lines (n=2-4) and confirmed the clear separation of clusters of iGlut (SLC17A6+) and iGABA (GAD1+/GAD2+) on Uniform Manifold Approximation and Projection (UMAP) (Fig. S1B-G). For snATAC-seq data, we transferred the cell identity labels from RNA-seq data on UMAP of gene-activity-score (gact) (Fig. S1H-I) and verified the robust transcription start site (TSS) read enrichment (>5) (Fig. S2A-D, Table S2). After merging all the post-batch-QC data, we obtained 651,012 neurons with an average of 7,510 unique molecular identifiers (UMI) per nucleus that are comparable across the 100 lines (Table S3, Fig. S4A-D). We defined three major subtypes of neurons: GABA (n=251,501), NEFM+ Glut (npglut; with stronger NEFM expression) (n=159,735), and NEFM- Glut (nmglut; with weaker NEFM expression) (n=137,564) (Fig. 1E-H, Fig. S4E-G, Table S1), with reproducible cell type clustering patterns across samples (Fig. S3A-E). Compared to single-cell transcriptomic profiles of brain excitatory and inhibitory neurons(34), our iGlut and iGABA are mostly similar to neurons of early brain developmental stages (from 2nd trimester to 2 years old; predominantly from 2nd trimester) (Fig. S5A-F). Transcriptomic and epigenomic landscape of cell-type-specific neuronal activation and its relevance to NPD With sn-Multiomics data of co-cultured human Glut/GABA neurons and mouse glia, we first identified differentially expressed genes (DEGs) upon stimulation by KCI (1 hr vs. 0 hr, and 6 hrs vs. 0 hr) in each neuron subtype. To minimize possible batch effects, we analyzed data from 18 donor lines of the same sequencing batch (batch 24; Table S1) and corrected for co-culture batch effect (Fig. S7A-B). We found that 36-65% genes were either up- or downregulated (log2FC > or < 0.25, FDR < 0.05) at 1 hr or 6 hrs in any cell type. The number of DEGs and their log2 fold changes (log2FC) from analysis of the 18 donor lines were very similar to those from 76 donor lines (Fig. S7C-D; Table S4). The upregulated genes showed much larger magnitude of expression changes than the downregulated genes, with the well-known ERGs (e.g., FOS, FOSB) exhibiting the largest FC (Fig. S7D, Table S4). Comparing to a similar study using iPSC-derived Glut and GABA neurons separately cultured and assayed by bulk RNA-seq(29) (Fig. S8, we found that for the upregulated genes at 45 min of KCI stimulation(29), 79% of them were also upregulated with log2FC > 0.5 at 1 hr in our npglut (60.2-fold enrichment, Fisher’s exact test P = 2.4 ´ 10-52) and GABA (24.5-fold enrichment, Fisher’s exact test P = 2.3 ´ 10-67), while about 81-83% were upregulated (log2FC > 0.25) in our npglut (24.5-fold enrichment, Fisher’s exact test P = 1.2 ´ 10-33) and GABA (19.7-fold enrichment, Fisher’s exact test P = 1.8 ´ 10-46). For LRGs, about 52-62% of those upregulated at 4 hrs in Sanchez-Priego et al. study(29) were also upregulated (log2FC>0.5) at 6 hrs in our npglut (7.2-fold enrichment, Fisher’s exact test P = 2.3 ´ 10-175) and GABA (9.9-fold enrichment, Fisher’s exact test P = 4.9 ´ 10-248), while about 69-71% were upregulated with log2FC>0.25 in our npglut (6.1-fold enrichment, Fisher’s exact test P = 6.6 ´ 10-162) and GABA (8.1-fold enrichment, Fisher’s exact test P = 3.0 ´ 10-212). For genes downregulated at 4 hrs (neurons at 1 hr had too few downregulated genes) of KCI stimulation(29), 4574% of them were also downregulated (with log2~FC<-0.5 or -0.25) at 6 hrs in our npglut or GABA (4.3 to 4.9-fold enrichments; Fisher’s exact test P = 1.7 ´ 10-106 to 3.6 ´ 10-133). The larger number of DEGs in our study may be attributed to the larger sample size and single-cell resolution. To ascertain the in vivo relevance of our findings, we compared our results to that from studying mouse models of neuron activation(21, 37). In mouse dentate granule neurons upon in vivo electroconvulsive stimulation(21), 3,006 genes were upregulated and 2,115 were downregulated at 1 hr after stimulation, while 3,087 were upregulated and 2,845 were downregulated at 4 hrs after stimulation (log2FC > or < 0.25, q-value <0.05) (21). These numbers of DEGs are comparable to that in our stimulated Glut neurons. More than 70% of the DEGs in both datasets had same directional expression changes (mostly upregulated) and showed strong correlation (Pearson’s R2 =0.272) of log2FC (Fig. S8B). For the 2,471 shared LRGs, over 64% of them had same directional expression changes, albeit a weaker but significant Pearson’s correlation of log2FC (R2=0.0597) (Fig. S8C). In another mouse study of sensory-experience-activated neuronal expression(37), of the 8,313 DEGs in at least one cell type, 419 upregulated and 192 downregulated genes (> 2-fold) were defined as ERGs (n=362) and LRGs (n=249). 59.2% of the Ex ERG (29/49; 15.6-fold enrichment, Fisher’s exact test P =6.1 ´ 10-20) and 29.7% of the Ex LRG (11/37; 4.4-fold enrichment, Fisher’s exact test P = 8.7 ´ 10-6) were shared with our npglut or nmglut DEGs (> 2-fold) at 1 hr or 6 hrs (Fig. S8D,E). At a log2FC cutoff of 0.25 (a cut-off that allows inclusion of more NPD risk genes), our npglut or nmglut DEGs also showed similar enrichments (3.5 to 6.6-fold) for mouse sensory-experience-induced ERGs and LRGs, with a notable increase of overlapping mouse sensory-experience-induced Ex ERGs (55.1%) and LRGs (54.1%) (Fig. S8D,E). For our iGABA neurons, DEGs with >2-fold or log2FC>0.25 expression increase at 1 hr or 6 hrs also showed significant overlap with mouse sensory-experience-induced Inh ERGs (34.8% or 16/46; 7.2-fold enrichment, Fisher’s exact test P = 4.0 ´ 10-8; FC > 2-fold) or LRGs (27.5% or 11/40; 5.1-fold enrichment, Fisher’s exact test P = 6.4 ´ 10-5; FC > 2-fold) (Fig. S8D,E). At an individual gene level, we found that some sensory-experience-induced canonical immediate-early genes (e.g., Nr4a1, Nr4a2, Nr4a3, Fos, Fosl2, and Egr1) known to regulate the late phases of gene expression(37) were also shared in our ERGs. We also confirmed that LRGs tend to be more cell type-specific in both datasets, e.g., Crh, an Inh neuronal LRG in mouse(37) that encodes the stress hormone, corticotropin-releasing hormone(37), were increased (> 2-fold) by KCI stimulation only in GABA at 6 hrs. To examine the biological relevance of these activity-dependent DEGs, we performed gene set enrichment analyses. Analysis of synaptic gene ontologies (SynGO)(115) showed that only the upregulated genes showed significant enrichment for synaptic genes (Fig. S9A). Our MAGMA(116) analysis found that common GWAS risk of NPD or traits had higher enrichment in upregulated genes than downregulated or unchanged genes, with strongest enrichment for SCZ (Fig. S10A). We also found a lack of enrichment for autism spectrum disorders (ASD), which may be due to the relatively small number of ASD GWAS risk loci. We thus performed an enrichment analysis using a set of 102 ASD risk genes from rare variant analysis(36), together with a set of SCZ rare variant-based risk genes(35), and other sets of GWAS risk genes for SCZ(7), bipolar disorder (BP)(117), major depressive disorder (MDD)(4), and post-traumatic stress disorder (PTSD)(118) (Table S5). We found strong enrichment for ASD risk genes among the upregulated genes across cell types, and to a less extent for SCZ rare variant-based risk genes (Fig. S9B). Interestingly, the majority of SCZ rare risk genes were upregulated by stimulation, rather than downregulated (Fig. 2B, Fig. S9C). We also confirmed the pseudo-bulk-based expression changes of some selected SCZ and ASD genes in single neurons (Fig. S9D) and validated the downregulation of SNAP91 (a SCZ GWAS risk gene) and IMMP2L (a GWAS risk gene for SCZ and MDD) by qPCR in independent cell cultures (Fig. S9E). We next analyzed the snATAC-seq (Fig. 1F, Fig. S1H) to characterize the landscape of activity-dependent OCR peaks upon stimulation. For the same 18 donor lines used for DEG analysis, we identified 150K to 300K peaks across cell types/timepoints (76 donor lines gave a similar number of peaks) (Fig. S10B). There were more peaks at 1 hr and 6 hrs (Fig. S10B), suggesting a robust effect of neuronal stimulation on global chromatin accessibility. About 19-22% of peaks are stimulation-specific across cell types (Fig. S10C). To identify OCR peaks that showed differential accessibility (i.e., DA peaks) upon neuronal stimulation in each cell type, we used limma(87) to analyze a set of merged peaks (170K for GABA, 196K for nmglut, and 207K for npglut). We found that 26-34% and 40-51% of the peaks showed DA at 1 hr and 6 hrs, respectively (Fig. S11A-C; Table S6), with more upregulated peaks than downregulated ones. We found that the proportion of DA peaks with FC > 2-fold (~11.8% of OCRs) in KCI-stimulated npglut at 6 hrs were comparable to that in mouse dentate granule neurons (11.5% of total OCRs, FC>2) upon in vivo electroconvulsive stimulation(21). We also confirmed that chromatin accessibility of well-established ERGs (e.g., ARC, FOS, and NPAS4) and LRGs (e.g., BDNF) were similarly increased in both datasets(21). Similar to DEGs (Fig. S7D), the upregulated peaks often showed larger magnitude of changes (Fig. S11C). As expected, FOS showed the largest increase of peak accessibility at 1 hr, highlighting its driver role as an ERG in inducing activity-dependent gene expression (Fig. S11C). The OCR peak of FOS with the strongest accessibility increase at 1 hr is ~4.2 kb downstream of its TSS with a conserved CEBP binding motif in the middle of the peak that showed peak-to-gene link (Fig. S11C-D), indicating this peak and its accessibility to CEBP may mediate the immediate expression of FOS upon stimulation. We next assessed the DA peak enrichment for SNP heritability of NPD (Fig. S11F, Fig. S11E). We found that only the DA peaks at 1 hr or 6 hrs of stimulation but not those static peaks (i.e., chromatin accessibility unaltered) showed GWAS enrichments (Fig. S11E). Taken together, our results highlight the widespread effects of neuronal stimulation on cell-type-specific transcriptomes and chromatin accessibility as well as their relevance to NPD. Activity-dependent expression of BDNF is regulated by cell-type-specific OCRs BDNF is a SCZ risk gene(7) that encodes a neurotrophin important for neuronal differentiation and synaptic plasticity(119, 120). Despite** being a well-established LRG, its cell-type-specific regulation is unknown. We found that BDNF exhibited late response in both Glut and GABA cells (Fig. 1I-J, Fig. S7D, and Fig. S10C), but the DA peaks upon stimulation were different between cell types (Fig. S10D, Fig. S12 A, and Table S6). The DA peak showing the strongest increase of accessibility in npglut and nmglut was about 49 kb upstream (i.e., putative enhancer) of the TSS of BDNF (Fig. S10D, Fig. S12A), but with much weaker accessibility in nmglut. However, the DA peak showing the largest increase of peak accessibility in GABA was near the 3’-UTR of BDNF. These DA peaks showed peak-gene promoter linkage at 1 hr and/or 6 hrs of stimulation and correlated with BDNF expression in their respective cell types (Fig. S10D, Fig. S12A), suggesting their possible enhancer function. We next validated whether the DA peak 49 kb upstream of the TSS of BDNF indeed regulated the activity-dependent BDNF expression in iGlut. We noted that this DA peak encompassed the binding site of early response AP-1 TFs (FOSB, JUNB, JUND, FOSL1, FOSL2) (Fig. S10E), suggesting an enhancer role of the OCR peak in regulating BDNF expression. We thus CRISPR/Cas9-engineered two iPSC lines (CD07 and CD15) by deleting the OCR peak (~600bp) (Fig. S10E, Fig. S12B-C). We then differentiated the two isogenic pairs of CRISPR/Cas9-edited iPSC lines into iGlut and carried out KCI stimulation. We found that while BDNF expression was similar between the unedited and deleted lines at 1 hr, its expression was significantly lower at 6 hrs of stimulation in the OCR-deleted lines (Fig. S10F). This supports the role of this OCR in regulating the activity-dependent expression of BDNF in iGlut. These results highlight that similar transcriptional responses across cell types may have distinct epigenomic regulatory mechanisms. Integrative multiomic analysis identifies regulatory program of neuronal activation While some TFs, e.g., FOS and JUNB, are well known regulators of early response, the regulatory programs controlling cell-type-specific late response are far less clear. We thus leveraged our multiomic data to identify putative TF regulators. We limited the analysis to TFs that showed differential expression and motif enrichment in at least one condition. We used chromVAR(121) to assess a single-cell level motif enrichment score in each of the 9 contexts (3 cell types, 3 time points). Combining motif scores with gene expression, we classified the role of each TF in early or late response. For a given cell type, we considered a TF to be a candidate regulator of early or late response when it showed higher motif enrichment and gene expression at 1 hr (vs. 0 hr), or at 6 hrs (vs. 1 hr), respectively (Table S13). We identified 145 candidate TF regulators of early response, nearly half of which are shared by all cell types (Fig. S14C). Some shared TFs with the largest motif changes from 0 hr to 1 hr are well-known early response TFs, e.g., FOS, JUNB, EGR1, and NPAS4, while some others have less established roles in early response, e.g., BACH2, JDP2, MAFK, and SMARCC1 (Fig. S14D). We noted that the expression of the shared TFs tends to be elevated transiently at 1 hr, but their motifs remain enriched at 6 hrs (e.g., FOS, JUNB, and NPAS4) (Fig. S14D), suggesting that the epigenomic changes established by these early response TFs were maintained at a later stage. Our analysis of candidate late response regulators revealed a very different pattern. We found 64 candidate TFs in at least one cell type. Compared to early response TF regulators, these TFs are much more cell-type-specific (Fig. S14E). Only 6 TFs were shared across cell types, including some important for neurodevelopment, such as MEF2C and ESR1 (Fig. S14D, F). This highlighted the distinct regulatory programs controlling late responses in Glut and GABA neurons. To understand the cell-type-specific responses, we focused on TFs whose roles are limited to only Glut or GABA cells. For Glut cells, we observed a relatively large number of candidate TFs regulating late response. The TFs showing strongest motif enrichment at 6 hrs are regulators of early response in all cell types, such as the AP1 family TFs (FOSL, JUND) and a member of SWI/SNF family (SMARCC1) (Fig. S14D). While the motifs of these TFs are also enriched in GABA cells, their motif enrichment is much stronger at 6 hrs in Glut cells, and their expression are also upregulated in Glut cells (Fig. S14D). These results thus suggested that a subset of early response TF regulators continue to drive late response, primarily in Glut cells. For GABA cells, we found 21 TF candidates for late response (Table S13). Most of the TFs showed modest changes of motif activities between 1 hr and 6 hrs, with a few exceptions (NR2F2, NR2C1, ESRRA) (Fig. S14D). We hypothesized that some TFs without large motif changes from 1 hr to 6 hrs may still play a role in neuronal response. We thus assessed motif enrichment of TFs at 6 hrs in GABA cells. The TFs showing highest motif enrichment include several GABA-specific TFs, ID3, EVX1, DLX5, and TCF4, a risk gene and a master regulator in SCZ(49). These TFs showed high expression and strong motif enrichment even before stimulation (Fig. S14D,F). We also found other TFs likely regulating early response specifically in GABA cells, such as EMX2 and NR4A2 (Fig. S14D,F). In contrast, there were much fewer Glut-specific early response TFs (Fig. S14C). These results thus suggest that distinct TF activities before and during the early response to the stimulation primed the epigenome of GABA cells, leading to varied late responses in GABA cells. Altogether, our results highlighted a shared regulatory program controlling early responses of all cell types, and distinct programs controlling late responses in Glut and in GABA cells. Some of the key TFs in these programs are NPD risk genes (e.g., MEF2C, TCF4), highlighting the importance of these regulatory programs to the development of NPD. Chromatin priming during neuron activation We noted frequent chromatin “priming” during neuron stimulation by KCI. Multiple ERGs (e.g., EGR1) and LRGs (e.g., DUSP4, ATP1B1, CREM, BDNF, PCSK1, and SCG2) (Fig. S6B,C) showed high gene activity scores (i.e., chromatin accessibility) at 0 hr or 1 hr, but gene expression was activated later. It has been suggested that some cell-type-specific enhancers (i.e., H3K27ac region) remained primed for activation by Fos/Jun-binding in mature neurons in response to neuronal activity(19). Such epigenetic priming has also been demonstrated during neuron differentiation in which many TFBSs exhibit a gain of H3K27ac or H3K4me1 at early NPC stages prior to increased expression of their associated genes(46). To mechanistically characterize the observed chromatin “Priming”, i.e., chromatin was partially active (“primed”) before stimulation, we used a published histone mark dataset from iPSC-derived neurons in both stimulated and unstimulated states(29). This dataset, however, includes only H3K27ac, but not H3K4me1. We thus used H3K27ac to annotate a set of OCRs that respond to neuron activation and examined the chromatin states of these OCRs before and after stimulation in GABA cells. Specifically, we defined a set of “response CREs” as the OCRs showing differential accessibility upon stimulation (FDR < 0.1 and log2FC > 1) and also having H3K27ac mark in the stimulated state. To better detect priming, we further limited our analysis to CREs whose nearby genes had low expression at 0 hr (using the expression of FOS, an early response gene, as the threshold). In total, we had 2,776 CREs. We then assessed the chromatin states (open chromatin and H3K27ac) of these response CREs at 0 hr (before stimulation). We found heterogeneous patterns of CRE activation (Fig. S14A): (1) Some CREs (33%) were in the closed chromatin state before stimulation, and were activated “de novo” by the neuron activation signal; (2) Some CREs (28%) were in the “primed” state: chromatin was partially accessible but had no H3K27ac mark, and became fully activated by neuron activation (see the example below in Fig. R19B,C); (3) Some CREs (39%) already had enhancer activities before stimulation, marked by H3K27ac. Next, we examined the chromatin states of a few loci before and after stimulation. As an example from GABA cells (Fig. S14B), we examined the regulatory sequences near an ERG, cysteine and serine rich nuclear protein 1 (CSRNP1), a gene essential for cephalic NPC proliferation and survival in zebrafish(122). We confirmed that CSRNP1 showed very low expression at 0 hr (below the level of FOS) and much higher expression at 1 hr and 6 hrs of stimulation. The chromatin of the OCR sequence (Chr3: 39152481-39153281) near the TSS was partially open (note the high read count, called as a peak by ArchR) at 0 hr but became fully active at 1 hr. To corroborate the observed chromatin priming at the CSRNP1 locus, we used Seq2PRINT to map TFBS(123) in this region. To ensure comparable coverage and library size across cell types and stimulation time points, cells were down-sampled so that each context had similar coverage. We first trained the seq2PRINT model using pseudo-bulk aggregations of all contexts and all cells. Chromosomes were partitioned into training, validation, and test sets, and the model was fit using a five-fold scheme to ensure evaluation across different chromosomes. Next, the seq2PRINT model was fine-tuned for each context using LoRA. Finally, we calculated TF-binding scores for all significant candidate regulatory elements identified by the co‑activation based OCR–gene linkage analysis, using the average scores across all folds from the LoRA‑fine‑tuned models. We found weak binding sites at 0 hr, however, two strong binding sites of the AP-1 family motif emerged at 1 hr. This analysis suggests that the partially open (i.e., primed) chromatin at 0 hr likely enabled the access by AP-1 family (FOS/JUNB), which then fully activated the CRE at 1 hr.

创建时间：

2026-01-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集