pileup-hi analysis and inputs: part 2

Name: pileup-hi analysis and inputs: part 2
Creator: Zenodo
Published: 2026-04-16 17:38:43
License: 暂无描述

Zenodo2026-04-16 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.19613933

下载链接

链接失效反馈

官方服务：

资源简介：

Part 2 of the pileup-hi analysis and inputs dataset: see below for instructions and file descriptions. ## Prelude - pileup-hi version 0.9.2 was used for all analysis. You can download it using Cargo: ```bashcargo install pileup-hi --version 0.9.2``` You can also grab the pre-compiled binaries from the [original repo's release page](https://github.com/epiliper/pileup-hi/releases/tag/0.9.2). - All testing was performed on MacOS Sequoia 15.7.4- this analysis requires 1.5TB+ of disk space. ## Other software used- samtools 1.2.3- htslib 1.2.3- perbase 1.2.0- b3sum 1.8.3- xsra 0.2.27- minimap2 v2.30-r1290-dirty- python 3.14.2- R and Rstudio (along with packages specified in .Rmd files) **NOTE:** for the instructions below, it is assumed that you have all the software listed somewhere in `$PATH`. For information on how to move software to `$PATH`, see [this thread](https://unix.stackexchange.com/questions/183295/adding-programs-to-path). ## Overall description - generating data Analysis consisted of running different pileup programs on five datasets. This was done 3 python scripts that can be adjusted to run a selection of tools on a selection of datasets. By default: they are configured to generate data for the entire paper. These scripts are described in detail below: ### bench.py: run time and peak memory usage This script launches tools on specified input files and records performance information to a spreadsheet `./reports/`. The script is configured by default to run all conditions on all files in triplicate, but you can modify this by changing the following variables: change iterations:```pythonNUM_ITERATIONS = 3``` change software/ output mode/ thread count:```python## tuple of command, output mode, threadcount (where applicable)METHODS = [ # ## Pileup Mode (run_mpileup, "plp", 1), (run_pileuphi, "plp", 1), (run_perbase, "plp", 1), (run_parampileup, "plp", 1), (run_pileuphi, "plp", 4), (run_perbase, "plp", 4), (run_parampileup, "plp", 4), (run_pileuphi, "plp", 8), (run_perbase, "plp", 8), (run_parampileup, "plp", 8), (run_pileuphi, "plp", 12), (run_perbase, "plp", 12), (run_parampileup, "plp", 12), ## Nucleotide frequency mode (run_pileuphi, "histo", 1), (run_pileuphi, "histo", 4), (run_pileuphi, "histo", 8), (run_pileuphi, "histo", 12), ] ``` change files to run on:```pythonFILES = [ "DRR793869_hg38.bam", "SRR19895870.bam", "SRR36374445_hg38.bam", "SRR30646149_hg38.bam", "ERR2756169_merged.bam" ]``` Once you've adjusted this to your liking, run the following to gather benchmarking data:```bashpython3 bench.py``` ### compare_output.py: output file hash calculation This script is adjustable similarly to `bench.py` (see above), except `METHODS` differs slightly in structure:```python# tuple of run func, ouptut mode, and threadsMETHODS = [ ("mpileup", run_mpileup, "plp", 1, ""), ("pileup-hi", run_pileuphi, "plp", 1), ("parallel mpileup", run_parampileup, "plp", 4), ("pileup-hi", run_pileuphi, "plp", 4), ("parallel mpileup", run_parampileup, "plp", 8), ("pileup-hi", run_pileuphi, "plp", 8), ("parallel mpileup", run_parampileup, "plp", 12), ("pileup-hi", run_pileuphi, "plp", 12) ]``` to run this script: ```bashpython3 compare_output.py``` ### compare_size.py: compare output size differences between `pileup-hi`'s 'histo' and 'plp' output modes. See the previous two sections for what parameters to adjust. This script will output to a sphreadsheet prefixed by `./size_comp*`. To run: ```bashpython3 compare_size.py``` ## Other files- `get_metrics.sh`: calculate alignment metrics such as depth, coverage, etc.- `hashes_2026Feb27.csv`: output file hashes comparing pileup-hi and samtools mpileup- `size_comp_2026Mar31.csv`: output file sizes comparing pileup-hi 'plp' and 'histo' output modes- `bench_report_2026Mar30.csv`: benchmark data- `para_mpileup.sh`: the parallel shell wrapper around samtools mpileup used in benchmarking. See the beginning of the script for instructions on usage.- `bench.Rmd`: figure generation script- `aln.sh`: script to generate BAMs from downloaded FASTQ data- `dl.sh`: script to download FASTQs from the SRA ## Contactreach out to either:- epil02 #(at)# uw #(dot)# edu- agrening #(at)# uw #(dot)# edu

提供机构：

Zenodo

创建时间：

2026-04-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集