five

pileup-hi analysis and inputs: part 2

收藏
Zenodo2026-04-16 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19613933
下载链接
链接失效反馈
官方服务:
资源简介:
Part 2 of the pileup-hi analysis and inputs dataset: see below for instructions and file descriptions.   ## Prelude - pileup-hi version 0.9.2 was used for all analysis. You can download it using Cargo: ```bashcargo install pileup-hi --version 0.9.2``` You can also grab the pre-compiled binaries from the [original repo's release page](https://github.com/epiliper/pileup-hi/releases/tag/0.9.2). - All testing was performed on MacOS Sequoia 15.7.4- this analysis requires 1.5TB+ of disk space. ## Other software used- samtools 1.2.3- htslib 1.2.3- perbase 1.2.0- b3sum 1.8.3- xsra 0.2.27- minimap2 v2.30-r1290-dirty- python 3.14.2- R and Rstudio (along with packages specified in .Rmd files) **NOTE:** for the instructions below, it is assumed that you have all the software listed somewhere in `$PATH`. For information on how to move software to `$PATH`, see [this thread](https://unix.stackexchange.com/questions/183295/adding-programs-to-path).  ## Overall description - generating data Analysis consisted of running different pileup programs on five datasets. This was done 3 python scripts that can be adjusted to run a selection of tools on a selection of datasets. By default: they are configured to generate data for the entire paper. These scripts are described in detail below: ### bench.py: run time and peak memory usage  This script launches tools on specified input files and records performance information to a spreadsheet `./reports/`. The script is configured by default to run all conditions on all files in triplicate, but you can modify this by changing the following variables:  change iterations:```pythonNUM_ITERATIONS = 3``` change software/ output mode/ thread count:```python## tuple of command, output mode, threadcount (where applicable)METHODS = [         # ## Pileup Mode        (run_mpileup, "plp", 1),         (run_pileuphi, "plp", 1),         (run_perbase, "plp", 1),        (run_parampileup, "plp", 1),         (run_pileuphi, "plp", 4),         (run_perbase, "plp", 4),        (run_parampileup, "plp", 4),         (run_pileuphi, "plp", 8),         (run_perbase, "plp", 8),        (run_parampileup, "plp", 8),         (run_pileuphi, "plp", 12),         (run_perbase, "plp", 12),        (run_parampileup, "plp", 12),         ## Nucleotide frequency mode        (run_pileuphi, "histo", 1),         (run_pileuphi, "histo", 4),         (run_pileuphi, "histo", 8),         (run_pileuphi, "histo", 12),         ] ``` change files to run on:```pythonFILES = [    "DRR793869_hg38.bam",    "SRR19895870.bam",    "SRR36374445_hg38.bam",    "SRR30646149_hg38.bam",    "ERR2756169_merged.bam"        ]``` Once you've adjusted this to your liking, run the following to gather benchmarking data:```bashpython3 bench.py``` ### compare_output.py: output file hash calculation This script is adjustable similarly to `bench.py` (see above), except `METHODS` differs slightly in structure:```python# tuple of run func, ouptut mode, and threadsMETHODS = [        ("mpileup", run_mpileup, "plp", 1, ""),          ("pileup-hi", run_pileuphi, "plp", 1),         ("parallel mpileup", run_parampileup, "plp", 4),        ("pileup-hi", run_pileuphi, "plp", 4),         ("parallel mpileup", run_parampileup, "plp", 8),        ("pileup-hi", run_pileuphi, "plp", 8),         ("parallel mpileup", run_parampileup, "plp", 12),        ("pileup-hi", run_pileuphi, "plp", 12)        ]``` to run this script: ```bashpython3 compare_output.py``` ###  compare_size.py: compare output size differences between `pileup-hi`'s 'histo' and 'plp' output modes. See the previous two sections for what parameters to adjust. This script will output to a sphreadsheet prefixed by `./size_comp*`. To run: ```bashpython3 compare_size.py``` ## Other files- `get_metrics.sh`: calculate alignment metrics such as depth, coverage, etc.- `hashes_2026Feb27.csv`: output file hashes comparing pileup-hi and samtools mpileup- `size_comp_2026Mar31.csv`: output file sizes comparing pileup-hi 'plp' and 'histo' output modes- `bench_report_2026Mar30.csv`: benchmark data- `para_mpileup.sh`: the parallel shell wrapper around samtools mpileup used in benchmarking. See the beginning of the script for instructions on usage.- `bench.Rmd`: figure generation script- `aln.sh`: script to generate BAMs from downloaded FASTQ data- `dl.sh`: script to download FASTQs from the SRA ## Contactreach out to either:- epil02 #(at)# uw #(dot)# edu- agrening #(at)# uw #(dot)# edu
提供机构:
Zenodo
创建时间:
2026-04-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作