Data for: Distinct impact modes of polygenic disposition to dyslexia in the adult brain

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.80gb5mkz6

下载链接

链接失效反馈

官方服务：

资源简介：

Dyslexia is a common and partially heritable condition that impacts reading ability. In a study of up to 35,231 adults, we explored the structural brain correlates of genetic disposition to dyslexia. Individual dyslexia-disposing genetic variants showed distinct patterns of association with brain structure. Independent component analysis revealed various brain networks that each had their own genomic profiles related to dyslexia susceptibility. Circuits involved in motor coordination, vision, and language were implicated. Polygenic scores for eight traits genetically correlated with dyslexia, including cognitive, behavioural, and reading-related psychometric measures, showed partial similarities to dyslexia in terms of brain-wide associations. Notably, the microstructure of the internal capsule was consistently implicated across all of these genetic dispositions, while the lower volume of the motor cortex was more specifically associated with dyslexia genetic disposition alone. These findings reveal genetic and neurobiological features that may contribute to dyslexia and its associations with other traits at the population level. Methods Experimental Design UK Biobank data were accessed following approval of application number 16066, P.I. Clyde Francks. UK Biobank is an in-depth investigation of more than 500,000 volunteers in the UK who are assessed for health, lifestyle, genomics, and many other variables (1). Multimodal brain MRI data (2, 3) had also been released for approximately 10% of the individuals when the present study was initiated in 2022 (4). The UK Biobank received ethical approval from the National Research Ethics Service Committee North West-Haydock (reference 11/NW/0382), and all of their procedures were performed in accordance with the World Medical Association guidelines. Written informed consent was provided by all of the enrolled participants. Genotyping has been performed using either BiLEVE Axiom or Axiom arrays from Affymetrix, which target highly overlapping sets of ~800,000 genomic variants with more than 95% similarity (5). The UK Biobank has also released common genome-wide variants imputed to the Haplotype Reference Consortium and UK10K haplotypes (5). In this study, we focused on participants who also underwent brain MRI at one of the four imaging sites and for whom at least one usable T1-weighted and/or diffusion MRI (dMRI) scan had been produced (see the next section). The genetic analyses were focused on the largest ancestry group within this cohort, recorded as ‘white British’ using a combination of self-report and genomic principal component analysis (this group constitutes ~85% of the overall dataset: data field #22006). Pairs of genetically related subjects with kinship coefficients above 0.044 were identified in the target sample (70). Individuals related to the largest number of others were recursively removed until no two individuals were related at or above this kinship threshold, leaving 35,231 individuals (18,363 females). The resulting sample encompassed individuals aged from 45 to 82 years, with a mean age of 64.2 years and a standard deviation of 7.7 years. We then included bi-allelic genetic variants with minor allele frequency >= 0.01, imputation quality score of higher than 0.7, and Hardy-Weinberg equilibrium P-value of greater than 10-7, yielding 8,366,177 autosomal single nucleotide variants (SNVs) and 1,092,696 short insertion-deletions (indels). We accessed minimally processed and brain-extracted T1-weighted brain MRI volumes of 42,798 individuals (2, 3) for tensor-based morphometry using symmetric image normalization (SyN) registration (6, 7). For the present study, we generated a study-specific average brain template in a randomly chosen subset of 1000 individuals. The template was generated through 11 consecutive Advanced Normalization Tools (ANTs v2.3.5) registrations that iteratively refined the template shape using rigid, affine, and diffeomorphic SyN transformations at incremental resolutions up to native resolution (i.e. 1 mm3). Statistical analysis Structural MRI: tensor-based morphometry Thereafter, all individuals’ original T1-weighted brain volumes were histogram matched, winsorized at 1-99 percentiles, and non-linearly registered to our study-specific template using SyN. Registration parameters included a variance for a total field of three, a variance for an update field of zero, a resolution downsampling scheme of 6×, 4×, 2×, and 1× (i.e. full resolution), and Gaussian smoothing at standard deviations of 4, 2, 1 and zero voxels. A cross-correlation metric with a radius of four voxels was used. The affine registration matrix was composed with the SyN deformation field and the final warps were subsequently converted to Jacobian determinant maps, which encode the amount of regional brain tissue ‘shrinkage’ or ‘expansion’ in the brain of each individual as compared to our study-specific, average T1 template. ANTs affine registrations failed in 2,098 individuals; instead of removing them, we opted to use a comparable linear registration method, FSL Flirt (8) to initialize SyN, while controlling for a potential batch effect in subsequent analyses as a binary covariate. Diffusion MRI: data preprocessing and fixel-based analysis We retrieved minimally-preprocessed dMRI volumes of 37,930 subjects from the UK Biobank (2, 3). These data have been collected at 2 mm3 isotropic resolution across 100 different diffusion-encoding directions evenly distributed on two spherical shells at b-values of 1000 and 2000 s/mm2, as well as eight blip-reversed b≅0 volumes. Diffusion images have been corrected for off-resonance warps, gradient non-linearity, Eddy currents, and head motion by the UK Biobank team (2, 3). For the present study, we reran these corrections on raw data for a first batch of 8,247 individuals whose corrected b-vector tables were not available, while accounting for a potential batch effect in the subsequent regression model fits through the use of a binary covariate. After data preprocessing, we constructed a study-specific fiber orientation density (FOD) template using MRTrix3 v3.0.3 (9) from a random subset of 890 individuals who passed registration quality control by visual inspection out of 1000. This procedure started with N4 bias field correction and intensity normalisation of the preprocessed diffusion volumes, and estimation of the average tract response function (10). Thereafter, spherical deconvolution was performed using the estimated response function to generate subject-wide FOD volumes. These volumes were subsequently non-linearly registered to a common space and an average FOD template was generated iteratively. The FOD template was then ‘fixelated’ to identify the principal directions of white-matter tracts in each voxel. The same procedures were repeated in all 37,930 individuals to generate FOD volumes, which were then registered to the study-specific FOD template (9). FOD registrations passed quality control in 37,884 individuals following a visual inspection of each individual’s template-transformed zeroth-order harmonic map, representing average isotropic diffusion in each voxel. FOD volumes were segmented to obtain fixel-wise readouts, which were then transformed, rotated, and corresponded to the template’s fixel-wise space (9). We considered apparent fiber density (AFD) readouts as a measure of white-matter microstructure for subsequent analyses (11). In combination with genetic data, the sample available was 31,695 adult individuals (16,198 female). Optimizing polygenic scoring We first concatenated the voxel-wise Jacobian and fixel-wise AFD maps across all individuals and then applied MELODIC independent component analysis (12, 13) to extract imaging-derived phenotypes (IDPs). MELODIC was performed separately per each imaging modality and at various dimensions to extract IDPs at incremental levels of spatial detail, following a geometric series corresponding with dimensions 11, 18, 29, 47, 76, 124, 200, and 324. Due to the large size of this data matrix (6.2×1010 voxels in structural MRI), we used 8,000 internal eigenmaps for independent source decomposition (14). In addition, principal component analysis was performed on the same data, and the first 324 principal components were extracted as additional IDPs. Altogether, a total of 1,153 IDPs were extracted from voxel-wise Jacobian maps and an equal number of IDPs from the fixel-wise AFD data. These IDPs were derived for the purpose of optimizing our polygenic scoring, but they were not used for our voxel- or fixel-based imaging genetic analyses, nor our impact mode analysis, which forms the bulk of the findings in this study. We used summary statistics from the largest genome-wide association study (GWAS) of dyslexia that has been performed to date, carried out by 23andMe, Inc. (15). This GWAS was based on 51,800 individuals of European ancestry who answered ‘Yes’ to the question ‘Have you been diagnosed with dyslexia?’, and 1,087,070 control individuals who answered ‘No’. The SNP-wise effect sizes from this GWAS were then applied to the genotype data of UK Biobank individuals, to estimate the polygenic disposition of each UK Biobank individual to dyslexia based on the combined effects of their autosome-wide genetic variants. Our primary approach for polygenic scoring was based on the Lassosum2 model (16). We observed a strong correlation between Lassosum2 PGS and two automated PGS methods, SBayesR (17) and PRS-CSauto (18). Lassosum2 generally explained the highest proportion of variance in brain IDPs (Supplementary Fig. S1) and was therefore used for the main analysis. This method fits a sparse elastic-net regression and optimizes two shrinkage penalties, including L1-norm (λ) and L2-norm (δ). A grid search across 30 λ and 10 δ values was utilized for optimization with respect to maximizing the top association with an IDP. The associations of dyslexia PGS were quantified with all 1,153 IDPs in each imaging modality using linear regression. A set of confound covariates were controlled for, including subject age at imaging visit (data field #21003, instance 2), age2, sex (data field #31), age×sex, age2×sex, the first ten principal components of genomic ancestry (data field #22009), genotyping array (data field #22000, either BiLEVE or Axiom), three dummy covariates encoding four UK Biobank neuroimaging sites (data field #54, instance 2), and the number of days passed since MRI scan incepted at the site (as a measure of slow drifts in MRI hardware performance; data field #53, instance 2). For structural MRI data, the type of affine registration (i.e. ANTs or Flirt) was further controlled as a covariate. Structural MRI analysis was performed either without (main analysis) or with (secondary analysis) correction for head size as a confounding covariate (data field #25000). For diffusion MRI data, the batch effect associated with diffusion preprocessing (i.e. either performed by our team or by the UK Biobank) was added to the covariates, and the analyses were done without (main analysis) and with (secondary analysis) the global mean apparent fiber density per individual as an extra covariate. We found that high δ values in the range of 102-104 slightly increased the accuracy of Lassosum2 over automated models PRS-CSauto and SBayesR, and λ in the range of 10-5-10-2 resulted in the highest accuracy of trait prediction (Supplementary Fig. S1). These shrinkage parameters were therefore used for subsequent analyses. Voxel- and fixel-wise brain associations with dyslexia polygenic scores We tested the brain-wide associations of dyslexia PGS with the voxel-wise and fixel-wise data in the UK Biobank. Both parametric (fsl_glm 6.0.3 (19)) and non-parametric (randomise v2.9 (20)) linear regression models were fitted to the data, the former to yield t-value maps for visualization and impact mode analysis, and the latter to generate brain-wide multiple comparisons-corrected P-value maps. To reduce computation costs, voxel-wise permutations were performed at half (2 mm3 isotropic) resolution with a wall-time of 9 days for 5,000 permutations per statistical contrast. The Randomise C++ code was modified to prevent short integer overflows due to the study sample size. No cluster enhancement was applied. The same sets of covariates as the previous section were used for optimization. In all cases, we observed that a parametric t-value of > 4.5 was equivalent to a non-parametric brain-wide corrected P-value of smaller than 0.05. To check the validity of our findings obtained with Lassosum2, we applied other methods for deriving PGS: SBayesR, PRS-CS, and PRS-CSauto. PRS-CS applies continuous shrinkage on variant-wise weights using Bayesian priors and is optimized using a single global shrinkage hyperparameter (ϕ). We explored four different ϕ values for optimizing PRS-CS, which were 10-6, 10-4, 0.01, and 1 (Supplementary Fig. S1). PRS-CSauto and SBayesR are automated polygenic scoring methods and therefore did not require hyperparameter optimization on an independent dataset. We found that dyslexia lassosum2 PGS was strongly correlated with dyslexia PGS derived from PRS-CSauto (Pearson’s r = 0.87 and 0.93 following optimization on structural or diffusion-derived measures, respectively) and SBayesR (r = 0.74 and 0.84, same order). Compared to lassosum2, these additional PGS exhibited highly similar brain-wide associations (Supplementary Fig. S2). To describe the white matter tracts that run through regions where fixels showed significant associations of AFD with dyslexia PGS, we ran probabilistic fiber tractography using the second-order Integration over Fiber Orientation Distributions (iFOD2) algorithm in the template space (21). Dyslexia locus-based neuroimaging association 42 individual genomic loci were significantly associated with dyslexia after genome-wide multiple testing correction in the 23andMe Inc. GWAS for dyslexia (15). 35 of these variants passed our genetic quality control process in the UK Biobank data (see the Materials and Methods section ‘Genetics’, above). At each of these 35 loci, the dosage of the dyslexia-disposing allele was calculated and used in separate linear regression models to find brain-wide associations with regional volume and white-matter microstructure (i.e. voxel-wise Jacobian values and fixel-wise AFD values, respectively), using the same approach and covariates as when testing voxel-wise and fixel-wise PGS associations. These covariates included age, age2, sex, age×sex, age2×sex, ten principal components of genomic ancestry, genotyping array, UK Biobank imaging site, the number of days passed since MRI scan incepted at the site, the type of affine registration (for structural MRI), and preprocessing being either performed by our team or by the UK Biobank team (for diffusion MRI). For each variant, we also performed secondary analyses in which head size or subject-average AFD across all fixels were additionally included as confound covariates, respectively in T1 and diffusion data modalities. Impact mode decomposition PGS approximates polygenic influences through a single scalar value. These models represent a weighted average of all disposing allele counts and are agnostic to variability in the brain-wide associations of genetic variants. We aimed to model the heterogeneity and the hidden covariance patterns in the brain-wide genomic associations. To achieve this, we initially created a brain-wide univariate association map (i.e. voxel-wise or fixel-wise t-value maps generated by a parametric regression) for each of the top independent 13,766 dyslexia GWAS loci, after clumping at a GWAS p-value threshold of less than 0.01, linkage disequilibrium r2 threshold of less than 0.1 and genomic window size of 500 kb (and using the same set of covariates as in all sections above). These voxel- or fixel-wise t-value maps were then concatenated across all 13,766 variants and decomposed by MELODIC into ten independent components, separately per each imaging modality. In order to enhance the sensitivity of ICA rotations to local effects rather than genetic associations with global measures, voxel-wise Jacobian determinant values were normalised to total brain volume before ICA. The default MELODIC ICA data transformations, including variance normalization and mean signal removal, were not applied as these momentums reflect meaningful signals in t-value maps (22). We refer to the extracted independent components as genomic impact modes, that reflect combinations of distinct genomic variants and spatial profiles through a limited number of features. Polygenic scores of additional traits related to dyslexia We first used LD score regression (23, 24) to confirm that we could detect previously reported genetic correlations between dyslexia and each of eight other behavioural, cognitive or education-related traits, based on summary statistics from the 23andMe dyslexia GWAS (15) and other large-scale GWAS studies: Attention-deficit/hyperactivity disorder (ADHD (25)), verbal numerical reasoning (a.k.a. fluid intelligence) (Pan-UKB team. https://pan.ukbb.broadinstitute.org. 2020.), the first principal component of school grades in mathematics and language (26), General Certificate of Secondary Education (GCSE) education (Pan-UKB team. https://pan.ukbb.broadinstitute.org. 2020.), word reading, non-word reading, spelling, and phonemic awareness (27). All of these traits showed significant genetic correlations |rg| > 0.4 with dyslexia in our analysis (all P < 10-23, Supplementary Fig. S4). In order to compare and contrast with dyslexia PGS, we then used Lassosum2 to generate PGS in the UK Biobank data for each of these eight additional traits, and mapped their brain-wide associations with the voxel-wise and fixel-wise data, using the same approach as for the dyslexia PGS. Further post hoc regression analyses We performed further regression analyses of the association between dyslexia PGS and voxel-wise volumes, this time using logarithm-transformed Jacobian determinant values to take allometry into account (28), rather than raw values, to assess whether this made a difference. In another post-hoc analysis, two extra covariates were added to assess voxel-wise and fixel-wise associations with dyslexia PGS independently of fluid intelligence and educational attainment: these covariates were ‘fluid intelligence’ (data-field #20016) and the number of years of education estimated from the data fields ‘qualifications’ (#6138) and ‘age completed full-time education’ (#845), following a previously published approach (29). Apart from the inclusion of these two covariates, the linear regression models were the same as the primary analyses described above in the Materials and Methods section ‘Voxel- and fixel-wise brain associations with dyslexia polygenic scores’.

创建时间：

2024-12-06