Pre-processed IgH receptor repertoire data from MS patients after aHSCT from BioProject PRJNA763367

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/5513966

下载链接

链接失效反馈

官方服务：

资源简介：

Data Processing Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2) software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8 quality_thresholds FilterSeq.py pRESTO Q>20 paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001 primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2 consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5 collapsing_method CollapseSeq.py pRESTO germline_database IMGT Format Processed sequences are provided in a tab delimited file format, including the following annotations: ISOTYPE_SUBCLASS Isotype subclass SEQUENCE_ID Sequence identifier JUNCTION_LENGTH Junction length CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence. DUPCOUNT UMI count for the given unique sequence ISOTYPE Constant region primer (isotype) MUT_TOTAL Total number of mutations in V gene SAMPLE Sample identifier, linking back to raw data JUNCTION Junction nucleotide sequence Protein_seq Amino acid sequence CDR3_AA_GRAVY CDR3 hydrophobicity index CDR3_AA_BULK CDR3 bulkiness CDR3_AA_ALIPHATIC CDR3 aliphatic index CDR3_AA_POLARITY CDR3 polarity CDR3_AA_CHARGE CDR3 normalized net charge CDR3_AA_BASIC CDR3 basic side chain residue content CDR3_AA_ACIDIC CDR3 acidic side chain residue content CDR3_AA_AROMATIC CDR3 aromatic side chain content Subset Defined B cell subset Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG) R_SCDR R/S ratio in CDR region R_SFWR R/S ratio in FWR region V_GENE V segment gene D_GENE D segment gene J_GENE J segment gene V_FAM V family gene Clust_REPRES Cluster representative Clust_SIZE Cluster size Sex Sex of the Subject UNIQUE_ID Sample identifier Bcellno Input B cell number Days_posttx Sampling time point relative to transplantation Age_at_tx Age of the subject (at aHSCT) Disease MS subtype Last_therapy Last therapy prior to aHSCT Disease_duration Disease duration CMV_reactivation Cytomegalovirus reactivation Month_label Month post-aHSCT inverval bin Patient_label Subject identifier References 1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932. 2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358. 3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41. 4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

创建时间：

2021-10-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集