Germinal centre-driven maturation of B cell response to SARS-CoV-2 mRNA vaccination

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/5895180

下载链接

链接失效反馈

官方服务：

资源简介：

These are the processed BCR repertoire and transcriptomics data described in Kim & Zhou et al., Nature, 2022. The raw sequencing data new to this study are available on SRA under BioProject PRJNA777934. This study also used BCR repertoire data from Turner & O'Halloran et al., Nature, 2021 (PRJNA731610) and Schmitz, Turner & Liu et al., Immunity, 2021 (PRJNA741267). Code Code along with Docker containers for reproducing the NGS data-based figures and analyses in the published paper can be found on GitHub. Metadata File: WU368_kim_et_al_nature_2022_meta.tsv Notes: Sample breakdown by `sequence_type` (132 total) 73 bulk BCR sequencing samples (`bulk`) 57 new 5 from Turner & O'Halloran et al., Nature, 2021 11 from Schmitz, Turner & Liu et al., Immunity, 2021. 56 10x Genomics single-cell VDJ + 5' gene expression samples (`tgx`) 3 samples from Turner & O'Halloran et al., Nature, 2021 (`mab`) corresponding to a total of 37 S-binding mAbs previously reported. These are not the same as the 2099 recombinant mAbs generated in this study (see below). Sample collection time was originally recorded in days in the `timepoint` column. Timepoints were referenced in weeks in the manuscript, as shown in the `timepoint_ms` column. `bio_rep` and `tech_rep` = biological replicate and technical replicate respectively. Abbreviations: LN = lymph node BM = bone marrow PB = plasmablast GC = germinal centre LLPC = long-lived plasma cell NS = no sorting mAb = monoclonal antibody Information on the 2099 recombinant mAbs generated in this study File: WU368_kim_et_al_nature_2022_mabs.tsv Notes on columns: `h_sequence_id` and `l_sequence_id`: Sequence IDs of the heavy and light chains respectively. `elisa`: ELISA results for binding to SARS-CoV-2 S (`TRUE` = positive). Processed BCR data - heavy chains File: WU368_kim_et_al_nature_2022_bcr_heavy.tsv Analysis was based on heavy chain-based clonal inference. Notes on columns: The columns largely follow the AIRR-C Rearrangement format. The main deviation is that CDR3s were used, as opposed to IMGT-defined "junctions". Nonetheless, junction-related columns are included here as some repositories such as iReceptor use these. Non-standard columns are noted below. `cell_id`: Only sequences from single-cell samples and the 37 mAbs from Turner & O'Halloran et al., Nature, 2021 have cell IDs following the format `[donor]_[sample]@[id]`. `NA` for bulk sequences. `sequence_id`: Sequence IDs follow the format `[donor]_[sample]@[id]`. `v_call_genotyped`: V gene annotation reassigned after individualized genotyping by TIgGER. `germline_[vdj]_call`: Clonal consensus germline calls after corresponding clonal consensus sequence were reconstructed via `CreateGermlines.py --cloned` from Change-O. `isotype`: IGH[ADEGM]. `cdr3`: CDR3 nucleotide sequence. `cdr3_length`: CDR3 nucleotide sequence length. `cdr3_aa`: CDR3 amino acid sequence. `collapse_count`: Number of duplicate IMGT-aligned V(D)J sequences that were collapsed by `alakazam::collapseDuplicates`. `donor`, `timepoint`, `tissue`, `sorting`, `seq_type`: Propagated as is from the metadata file. In `seq_type`, `tgx` corresponds to 10x Genomics data; `mab` corresponds specifically to the 37 S-binding mAbs from Turner & O'Halloran et al., Nature, 2021. `timepoint_2`: Same as `timepoint`, except that `d28+d35` and `d201+d208` were treated as `d28` (week 4) and `d201` (week 29) respectively as described in Materials & Methods. `gex_anno`: Cell type identity annotation based on transcriptomic profiles. Mapped from `anno_leiden_0.18` from WU368_kim_et_al_nature_2022_gex_b_cells.h5ad. `compartment`: B cell compartment. ABC = activated B cell. LNPC = lymph node plasma cell. RMB = resting memory B cell. Minor differences in terminology The manuscript refers to the memory compartment as MBCs, whereas the terminology used in the data is RMB. As described in Materials & Methods, analysis involving the memory compartment used specifically d201 bulk-sequenced memory sorts from blood. To get these sequences, subset `s_pos_clone`, `seq_type`, `compartment`, and `timepoint_2` to, respectively, `TRUE`, `bulk`, `RMB`, and `d201`. The manuscript uses the term BMPC (bone marrow plasma cell), whereas the data uses the term LLPC. `clone_id`: B cell clonal lineage IDs follow the format `[donor]@[id]`. `s_pos_clone`: `TRUE` if a sequence belonged to a B cell clone that was designated as S-binding by virtue of containing one of the recombinant mAbs that tested positive via ELISA or one of the S-binding mAbs from Turner & O'Halloran et al., Nature, 2021. `expressed_id`: mAb IDs for the 2099 recombinant mAbs generated in this study (mapped from `mab_id` from WU368_kim_et_al_nature_2022_mabs.tsv) and the 37 mAbs from Turner & O'Halloran et al., Nature, 2021. `NA` for everything else. `elisa`: ELISA results for binding of recombinant mAbs to SARS-CoV-2 S. `TRUE` if positive. `NA` if not tested. `nuc_RS_19_312`: number of replacement and silent mutations between IMGT-numbered nucleotide positions 19-312 along IGHV sequences, calculated by `shazam::calcObservedMutations`. `nuc_denom_19_312`: number of informative nucleotide positions for counting mutations, excluding non-A/T/G/C positions (such as "N", "-", "."). `nuc_RS_freq_19_312`: nucleotide-level mutation frequency (= nuc_RS_19_312 / nuc_denom_19_312). Processed BCR data - light chains File: WU368_kim_et_al_nature_2022_bcr_light.tsv Light chains were not used for heavy chain-based clonal inference or analysis. Processed transcriptomics data Files: WU368_kim_et_al_nature_2022_gex_all_cells.h5ad (clustering all cells) WU368_kim_et_al_nature_2022_gex_b_cells.h5ad (re-clustering only the B cells) Notes: The `h5ad` files can be imported into Scanpy as an AnnData object. Each `AnnData` object has 3 `.layers`, each representing a version of the count matrix. `raw_counts`: Imported from `cellranger aggr` output by `scanpy.read_10x_mtx`. `log_norm`: Log-noramlized expression values outputted by `scanpy.pp.normalize_total` followed by `scanpy.pp.log1p`. `scaled`: The `log_norm` layer scaled to unit variance and zero mean by `scanpy.pp.scale`. The `gene_name` and `biotype` columns in `.var` were extracted from GENCODE v32 GTF. Columns in `.obs` (each row corresponds to a cell) `n_feature`: The `n_genes_by_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The number of genes expressed. This is before subsetting the genes. `n_umi`: The `total_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The total UMI counts in a cell. `pct_mt`: The `pct_counts_mt` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The percentage of counts in mitochondrial genes. `n_hkg`: The number of housekeeping genes for which expression was detected. `n_gene_expressed`: The total number of genes for which expression was detected. This is after subsetting the genes. `pre_qc_bcr`: `TRUE` if a cell also had paired BCR data available. Produced by cross-referencing the cellular barcodes in `cell_barcodes.json` outputted by `cellranger vdj`. At this point the BCR data had not gone through the QC process in the BCR processing pipeline (hence `pre_qc`). `leiden_[resolution]`: Cluster assignment by `scanpy.tl.leiden`. `anno_leiden_[resolution]`: Cell type identity annotations based on transcriptomic profiles. This was mapped onto the `gex_anno` column in the processed heavy chain BCR data. UMAP coordinates can be found in `.obsm["X_umap"]`. `.X` has been set to `None` in order to reduce file size. In addition, the preprocessed count matrix outputted by `cellranger aggr` is available from GEO under BioProject PRJNA777934.

创建时间：

2022-02-22