下载链接：

https://zenodo.org/record/5155564

下载链接

链接失效反馈

官方服务：

资源简介：

Data Processing Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2) software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8 quality_thresholds FilterSeq.py pRESTO Q>20 paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001 primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2 consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5 collapsing_method CollapseSeq.py pRESTO germline_database IMGT Format Processed sequences are provided in a tab delimited file format, including the following annotations: C_CALL Isotype subclass SEQUENCE_ID Sequence identifier V_CALL V segment gene and allele D_CALL D segment gene and allele J_CALL J segment gene and allele JUNCTION_LENGTH Junction length CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence. DUPCOUNT UMI count for the given unique sequence ISOTYPE Constant region primer (isotype) MU_COUNT_CDR_R Number of replacement mutations in CDR region MU_COUNT_CDR_S Number of silent mutations in CDR region MU_COUNT_FWR_R Number of replacement mutations in FWR region MU_COUNT_FWR_S Number of silent mutations in FWR region MUT_TOTAL Total number of mutations in V gene NP_LENGTH Total number of N and P additions SEQUENCE_INPUT Full length sequence SEQUENCE_IMGT Gapped IMGT sequence V_GERM_START_VDJ position of the first nucleotide in ungapped V germline sequence alignment JUNCTION Junction nucleotide sequence GERMLINE_IMGT_D_MASK IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions CDR3_AA_GRAVY CDR3 hydrophobicity CDR3_AA_BULK CDR3 bulkiness CDR3_AA_ALIPHATIC Normalized aliphatic index CDR3_AA_POLARITY CDR3 polarity CDR3_AA_CHARGE normalised net charge CDR3_AA_BASIC Basic side chain residue content CDR3_AA_ACIDIC Acidic side chain residue content CDR3_AA_AROMATIC aromatic side chain conten Subset Defined B cell subset Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG) R_SCDR R/S ratio in CDR region R_SFWR R/S ratio in FWR region V_GENE V segment gene D_GENE D segment gene J_GENE J segment gene V_FAM V family gene Run ID of sequencing run Sex Sex of the Subject Age Age of the subject UNIQUE_ID Subject identifier SAMPLE Sample identifier, linking back to raw data Bcellno Number of input B cells Cells Cell type References 1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932. 2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358. 3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41. 4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

应用场景：

Pre-processed IgH repertoire sequencing data from BioProject PRJNA748239