five

Targeted sequencing of 252 genes based on their relevance in lymphoid malignancies

收藏
DataCite Commons2025-01-15 更新2025-04-16 收录
下载链接:
https://figshare.scilifelab.se/articles/dataset/Targeted_sequencing_of_252_genes_based_on_their_relevance_in_lymphoid_malignancies/19721998
下载链接
链接失效反馈
官方服务:
资源简介:
<strong>Dataset description</strong> Data consists of CRAM file from capture-based gene panel sequencing  (Twist Bioscience) of 252 genes selected based on their relevance in lymphoid malignancies. The panel also included genome-wide backbone probes for copy-number analysis. The preprared libraries were then subsequenlty equenced in paired-end mode (2x150bp) on the Illumina NovaSeq 6000 (Illumina Inc.). BALSAMIC was used to analyze the FASTQ files and aligning them to reference genome. Trimmed reads were mapped to the reference genome hg19 using BWA MEM v0.7.15 4. The resulting SAM files were converted to BAM files and sorted using samtools v1.6. Duplicated reads were marked using Picard tools MarkDuplicate v2.17.0. And finally converted to CRAM files using samtools v1.6. <br> Note: CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression: https://www.ebi.ac.uk/ena/cram/ <br> <strong>Data Access Statement</strong> The data is under restricted access and can be accessed upon request through the email-adress below. The targeted sequence datasets are only to be used for research aimed at advancing the understanding of genetic factors in the chronic lymphocytic leukemia. Applications aimed at method development including bioinformatics would not be considered as acceptable for use of this dataset.

<strong>数据集描述</strong><br>本数据集包含基于捕获式基因面板测序(Twist Bioscience平台)生成的CRAM(CRAM)文件,所测序的252个基因均为经筛选且与淋巴恶性肿瘤密切相关的基因。该基因面板还涵盖了用于拷贝数分析的全基因组骨架探针。制备完成的文库随后于Illumina NovaSeq 6000测序仪(Illumina公司)上以2×150bp双端测序模式完成测序。<br><br>采用BALSAMIC工具分析FASTQ(FASTQ)文件,并将测序读段比对至参考基因组。经修剪的测序读段通过BWA MEM v0.7.15比对至参考基因组hg19。生成的SAM(SAM)文件经samtools v1.6转换为BAM(BAM)文件并完成排序。使用Picard工具的MarkDuplicate v2.17.0标记重复读段。最终通过samtools v1.6将文件转换为CRAM文件。<br><br>注:CRAM是一种通过参考序列压缩测序数据的测序读段文件格式,存储空间利用率极高,同时支持无损与有损两种压缩模式:https://www.ebi.ac.uk/ena/cram/<br><br><strong>数据获取声明</strong><br>本数据集采取受限访问策略,可通过下方邮箱提交申请获取。本靶向测序数据集仅可用于旨在深化慢性淋巴细胞白血病遗传机制认知的研究。包括生物信息学在内的方法开发类申请,均不被视为本数据集的合法使用范围。
提供机构:
Karolinska Institutet
创建时间:
2022-05-06
二维码
社区交流群
二维码
科研交流群
商业服务