five

Data_Sheet_1_sumrep: A Summary Statistic Framework for Immune Receptor Repertoire Comparison and Model Validation.pdf

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_sumrep_A_Summary_Statistic_Framework_for_Immune_Receptor_Repertoire_Comparison_and_Model_Validation_pdf/10116386
下载链接
链接失效反馈
官方服务:
资源简介:
The adaptive immune system generates an incredible diversity of antigen receptors for B and T cells to keep dangerous pathogens at bay. The DNA sequences coding for these receptors arise by a complex recombination process followed by a series of productivity-based filters, as well as affinity maturation for B cells, giving considerable diversity to the circulating pool of receptor sequences. Although these datasets hold considerable promise for medical and public health applications, the complex structure of the resulting adaptive immune receptor repertoire sequencing (AIRR-seq) datasets makes analysis difficult. In this paper we introduce sumrep, an R package that efficiently performs a wide variety of repertoire summaries and comparisons, and show how sumrep can be used to perform model validation. We find that summaries vary in their ability to differentiate between datasets, although many are able to distinguish between covariates such as donor, timepoint, and cell type for BCR and TCR repertoires. We show that deletion and insertion lengths resulting from V(D)J recombination tend to be more discriminative characterizations of a repertoire than summaries that describe the amino acid composition of the CDR3 region. We also find that state-of-the-art generative models excel at recapitulating gene usage and recombination statistics in a given experimental repertoire, but struggle to capture many physiochemical properties of real repertoires.
创建时间:
2019-11-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作