Germinal centre-driven maturation of B cell response to SARS-CoV-2 mRNA vaccination
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5895180
下载链接
链接失效反馈官方服务:
资源简介:
These are the processed BCR repertoire and transcriptomics data described in Kim & Zhou et al., Nature, 2022. The raw sequencing data new to this study are available on SRA under BioProject PRJNA777934. This study also used BCR repertoire data from Turner & O'Halloran et al., Nature, 2021 (PRJNA731610) and Schmitz, Turner & Liu et al., Immunity, 2021 (PRJNA741267).
Code
Code along with Docker containers for reproducing the NGS data-based figures and analyses in the published paper can be found on GitHub.
Metadata
File: WU368_kim_et_al_nature_2022_meta.tsv
Notes:
Sample breakdown by `sequence_type` (132 total)
73 bulk BCR sequencing samples (`bulk`)
57 new
5 from Turner & O'Halloran et al., Nature, 2021
11 from Schmitz, Turner & Liu et al., Immunity, 2021.
56 10x Genomics single-cell VDJ + 5' gene expression samples (`tgx`)
3 samples from Turner & O'Halloran et al., Nature, 2021 (`mab`) corresponding to a total of 37 S-binding mAbs previously reported. These are not the same as the 2099 recombinant mAbs generated in this study (see below).
Sample collection time was originally recorded in days in the `timepoint` column. Timepoints were referenced in weeks in the manuscript, as shown in the `timepoint_ms` column.
`bio_rep` and `tech_rep` = biological replicate and technical replicate respectively.
Abbreviations:
LN = lymph node
BM = bone marrow
PB = plasmablast
GC = germinal centre
LLPC = long-lived plasma cell
NS = no sorting
mAb = monoclonal antibody
Information on the 2099 recombinant mAbs generated in this study
File: WU368_kim_et_al_nature_2022_mabs.tsv
Notes on columns:
`h_sequence_id` and `l_sequence_id`: Sequence IDs of the heavy and light chains respectively.
`elisa`: ELISA results for binding to SARS-CoV-2 S (`TRUE` = positive).
Processed BCR data - heavy chains
File: WU368_kim_et_al_nature_2022_bcr_heavy.tsv
Analysis was based on heavy chain-based clonal inference.
Notes on columns:
The columns largely follow the AIRR-C Rearrangement format. The main deviation is that CDR3s were used, as opposed to IMGT-defined "junctions". Nonetheless, junction-related columns are included here as some repositories such as iReceptor use these. Non-standard columns are noted below.
`cell_id`: Only sequences from single-cell samples and the 37 mAbs from Turner & O'Halloran et al., Nature, 2021 have cell IDs following the format `[donor]_[sample]@[id]`. `NA` for bulk sequences.
`sequence_id`: Sequence IDs follow the format `[donor]_[sample]@[id]`.
`v_call_genotyped`: V gene annotation reassigned after individualized genotyping by TIgGER.
`germline_[vdj]_call`: Clonal consensus germline calls after corresponding clonal consensus sequence were reconstructed via `CreateGermlines.py --cloned` from Change-O.
`isotype`: IGH[ADEGM].
`cdr3`: CDR3 nucleotide sequence.
`cdr3_length`: CDR3 nucleotide sequence length.
`cdr3_aa`: CDR3 amino acid sequence.
`collapse_count`: Number of duplicate IMGT-aligned V(D)J sequences that were collapsed by `alakazam::collapseDuplicates`.
`donor`, `timepoint`, `tissue`, `sorting`, `seq_type`: Propagated as is from the metadata file.
In `seq_type`, `tgx` corresponds to 10x Genomics data; `mab` corresponds specifically to the 37 S-binding mAbs from Turner & O'Halloran et al., Nature, 2021.
`timepoint_2`: Same as `timepoint`, except that `d28+d35` and `d201+d208` were treated as `d28` (week 4) and `d201` (week 29) respectively as described in Materials & Methods.
`gex_anno`: Cell type identity annotation based on transcriptomic profiles. Mapped from `anno_leiden_0.18` from WU368_kim_et_al_nature_2022_gex_b_cells.h5ad.
`compartment`: B cell compartment.
ABC = activated B cell. LNPC = lymph node plasma cell. RMB = resting memory B cell.
Minor differences in terminology
The manuscript refers to the memory compartment as MBCs, whereas the terminology used in the data is RMB. As described in Materials & Methods, analysis involving the memory compartment used specifically d201 bulk-sequenced memory sorts from blood. To get these sequences, subset `s_pos_clone`, `seq_type`, `compartment`, and `timepoint_2` to, respectively, `TRUE`, `bulk`, `RMB`, and `d201`.
The manuscript uses the term BMPC (bone marrow plasma cell), whereas the data uses the term LLPC.
`clone_id`: B cell clonal lineage IDs follow the format `[donor]@[id]`.
`s_pos_clone`: `TRUE` if a sequence belonged to a B cell clone that was designated as S-binding by virtue of containing one of the recombinant mAbs that tested positive via ELISA or one of the S-binding mAbs from Turner & O'Halloran et al., Nature, 2021.
`expressed_id`: mAb IDs for the 2099 recombinant mAbs generated in this study (mapped from `mab_id` from WU368_kim_et_al_nature_2022_mabs.tsv) and the 37 mAbs from Turner & O'Halloran et al., Nature, 2021. `NA` for everything else.
`elisa`: ELISA results for binding of recombinant mAbs to SARS-CoV-2 S. `TRUE` if positive. `NA` if not tested.
`nuc_RS_19_312`: number of replacement and silent mutations between IMGT-numbered nucleotide positions 19-312 along IGHV sequences, calculated by `shazam::calcObservedMutations`.
`nuc_denom_19_312`: number of informative nucleotide positions for counting mutations, excluding non-A/T/G/C positions (such as "N", "-", ".").
`nuc_RS_freq_19_312`: nucleotide-level mutation frequency (= nuc_RS_19_312 / nuc_denom_19_312).
Processed BCR data - light chains
File: WU368_kim_et_al_nature_2022_bcr_light.tsv
Light chains were not used for heavy chain-based clonal inference or analysis.
Processed transcriptomics data
Files:
WU368_kim_et_al_nature_2022_gex_all_cells.h5ad (clustering all cells)
WU368_kim_et_al_nature_2022_gex_b_cells.h5ad (re-clustering only the B cells)
Notes:
The `h5ad` files can be imported into Scanpy as an AnnData object.
Each `AnnData` object has 3 `.layers`, each representing a version of the count matrix.
`raw_counts`: Imported from `cellranger aggr` output by `scanpy.read_10x_mtx`.
`log_norm`: Log-noramlized expression values outputted by `scanpy.pp.normalize_total` followed by `scanpy.pp.log1p`.
`scaled`: The `log_norm` layer scaled to unit variance and zero mean by `scanpy.pp.scale`.
The `gene_name` and `biotype` columns in `.var` were extracted from GENCODE v32 GTF.
Columns in `.obs` (each row corresponds to a cell)
`n_feature`: The `n_genes_by_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The number of genes expressed. This is before subsetting the genes.
`n_umi`: The `total_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The total UMI counts in a cell.
`pct_mt`: The `pct_counts_mt` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The percentage of counts in mitochondrial genes.
`n_hkg`: The number of housekeeping genes for which expression was detected.
`n_gene_expressed`: The total number of genes for which expression was detected. This is after subsetting the genes.
`pre_qc_bcr`: `TRUE` if a cell also had paired BCR data available. Produced by cross-referencing the cellular barcodes in `cell_barcodes.json` outputted by `cellranger vdj`. At this point the BCR data had not gone through the QC process in the BCR processing pipeline (hence `pre_qc`).
`leiden_[resolution]`: Cluster assignment by `scanpy.tl.leiden`.
`anno_leiden_[resolution]`: Cell type identity annotations based on transcriptomic profiles. This was mapped onto the `gex_anno` column in the processed heavy chain BCR data.
UMAP coordinates can be found in `.obsm["X_umap"]`.
`.X` has been set to `None` in order to reduce file size.
In addition, the preprocessed count matrix outputted by `cellranger aggr` is available from GEO under BioProject PRJNA777934.
创建时间:
2022-02-22



