five

CWL run of Alignment Workflow (CWLProv 0.6.0 Research Object)

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/2632836
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see CWLProv 0.6.0 or use the cwlprov Python tool to explore.   The CWL alignment workflow included in this case study is designed by Data Biosphere. It adapts the alignment pipeline originally developed at Abecasis Lab, The University of Michigan. This workflow is part of NIH Data Commons initiative and comprises of four stages. First step, Pre-align, accepts a Compressed Alignment Map (CRAM) file (a compressed format for BAM files developed by European Bioinformatics Institute (EBI)) and human genome reference sequence as input and using underlying software utilities of SAMtools such as view, sort and fixmate returns a list of fastq files which can be used as input for the next step. The next step Align also accepts the human reference genome as input along with the output files from Pre-align and uses BWA-mem to generate aligned reads as BAM files. SAMBLASTER is used to mark duplicate reads and SAMtools view to convert read files from SAM to BAM format. The BAM files generated after lign are sorted with SAMtool sort'. Finally, these sorted alignment files are merged to produce single sorted BAM file using SAMtools merge in Post-align step.   Steps to reproduce This analysis was run using a 16-core Linux cloud instance with 64GB RAM and pre-installed docker. Install gsutils   export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | \ sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \ sudo apt-key add - sudo apt-get update && sudo apt-get install google-cloud-sdk   Get the data and make the analysis environment ready:   git clone https://github.com/FarahZKhan/topmed-workflows.git cd topmed-workflows git checkout cwlprov_testing cd aligner/sbg-alignment-cwl # this is a custom script download google bucket files from json files and create a local json # it needs gsutil to be installed though git clone https://github.com/DailyDreaming/fetch_gs_frm_json.git # Wait... this should download ~18Gb. python2.7 fetch_gs_frm_json/dl_gsfiles_frm_json.py topmed-alignment.sample.json   Run the following commands to create the CWLProv Research Object: time cwltool --no-match-user --provenance alignmnentwf0.6.0 --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-alignment.cwl topmed-alignment.sample.json.new zip -r alignment_0.6.0_linux.zip alignment_0.6.0_linux sha256sum alignment_0.6.0_linux.zip > alignment_0.6.0_linux.zip.sha25
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作