A multi-omics systems vaccinology resource to develop and test computational models of immunity: 1st challenge dataset and submissions

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/7702789

下载链接

链接失效反馈

官方服务：

资源简介：

The goal of the CMI-PB prediction contest is to foster a collaborative research community that can collectively tackle challenges and accelerates scientific progress beyond the capabilities of individual researchers or groups. The CMI-PB consortium has curated multi-source data from multiple individuals, encompassing Ab titers (around four antibodies/features), cell frequency (approximately 20 cell types/features), gene expression (roughly 50,000 RNA transcripts/features), and plasma proteomics (around 50 proteins/features). The challenge requires integrating these diverse data sources to predict different immune responses or tasks. Specifically, you will utilize multi-source data from several individuals on day 0 (baseline) to predict specific immune responses at later time points (1, 3, 7, and 14 days post-booster vaccination). The first CMI-PB challenge, which is an internal challenge, was conducted using datasets from 2020 (train) and 2021 (test). In the following sections, we provide detailed information on the datasets, challenge tasks, submission format, descriptions, and access to the necessary data files for participants to develop their models and make predictions. A) Multiomics CMI-PB dataset: We propose a study design that enables a systems-level understanding of the immune responses through computational modeling. Our cohort comprises aP vs. wP infancy-primed subjects boosted with Tdap. We recruit individuals born before 1995 (wP) and after 1996 (aP), collect baseline plasma and blood samples, and then at 1, 3, 7 and 14 days post booster vaccination. With the obtained samples processed, we generated omics data by: Bulk PBMCs transcriptomics, Plasma proteomics using Olink, which provides a quantitative readout of cytokines, chemokines, and other immune factors, Cell frequency in PBMCs using flow cytometry, Tdap-specific antibodies levels B) List of tasks can be accessed using the “List of tasks for challenge 1.docx” file, and submissions need to submit in provided format here: “submission template challenge 1.tsv” C) Datasets for model building and making predictions: Data files are divided into two categories: 1) raw dataset and 2) computable matrices. Raw dataset: This raw-most dataset is divided into training and test datasets. Computable matrices: There are three different types of computable matrices. a) Full: These files are generated by dividing raw files into sub-files specific to planned days specific to vaccination. b) harmonized: These are generated by preserving only overlapping features between train and test datasets. b) imputed: MICE imputation is performed to impute missing values in the dataset. D) Submission evaluation This folder contains all submitted models with ranking files and code for evaluating these models. To learn more about the CMI-PB prediction challenge, visit our website at www.cmi-pb.org.

创建时间：

2024-03-22