five

Pre-processed IgH repertoire sequencing data from BioProject PRJNA748239

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5155564
下载链接
链接失效反馈
官方服务:
资源简介:
Data Processing Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)   software_versions            pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8 quality_thresholds            FilterSeq.py pRESTO Q>20 paired_reads_assembly        AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001 primer_match_cutoffs        MaskPrimers.py pRESTO C primer & V primer maxerror 0.2 consensus_building        BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5 collapsing_method        CollapseSeq.py pRESTO germline_database        IMGT   Format   Processed sequences are provided in a tab delimited file format, including the following annotations:   C_CALL                    Isotype subclass SEQUENCE_ID                Sequence identifier V_CALL                    V segment gene and allele D_CALL                    D segment gene and allele J_CALL                    J segment gene and allele JUNCTION_LENGTH            Junction length CONSCOUNT                Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence. DUPCOUNT                    UMI count for the given unique sequence ISOTYPE                    Constant region primer (isotype) MU_COUNT_CDR_R            Number of replacement mutations in CDR region MU_COUNT_CDR_S            Number of silent mutations in CDR region MU_COUNT_FWR_R            Number of replacement mutations in FWR region MU_COUNT_FWR_S            Number of silent mutations in FWR region MUT_TOTAL                    Total number of mutations in V gene  NP_LENGTH                    Total number of N and P additions             SEQUENCE_INPUT            Full length sequence SEQUENCE_IMGT                Gapped IMGT sequence V_GERM_START_VDJ            position of the first nucleotide in ungapped V germline sequence alignment JUNCTION                    Junction nucleotide sequence GERMLINE_IMGT_D_MASK        IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions CDR3_AA_GRAVY       CDR3 hydrophobicity CDR3_AA_BULK       CDR3 bulkiness CDR3_AA_ALIPHATIC        Normalized aliphatic index CDR3_AA_POLARITY        CDR3 polarity CDR3_AA_CHARGE        normalised net charge CDR3_AA_BASIC        Basic side chain residue content CDR3_AA_ACIDIC        Acidic side chain residue content CDR3_AA_AROMATIC        aromatic side chain conten Subset                    Defined B cell subset  Repertoire                    Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG) R_SCDR                    R/S ratio in CDR region R_SFWR                    R/S ratio in FWR region V_GENE                    V segment gene D_GENE                    D segment gene J_GENE                    J segment gene V_FAM                    V family gene Run                        ID of sequencing run Sex                        Sex of the Subject Age                        Age of the subject UNIQUE_ID                    Subject identifier  SAMPLE                    Sample identifier, linking back to raw data Bcellno                    Number of input B cells Cells                    Cell type                 References 1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932. 2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358. 3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41. 4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.
创建时间:
2021-08-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作