Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations â application to HIV-1 quasispecies
收藏DataONE2023-12-07 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:6a4913b25780f4974c368bef9c4d32ffe122d1f80a2b15142c24616a2ad29bc0
下载链接
链接失效反馈官方服务:
资源简介:
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing w..., This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. \"Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations â application to HIV-1 quasispecies\"
Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005
For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3â4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample an..., Sequence analysis pipelines, R code, and supplemental info for these data are found at:
https://github.com/MullinsLab/chunked_demux.git
https://github.com/MullinsLab/sUMI_dUMI_comparison.git
https://doi.org/10.5281/zenodo.7672191
https://doi.org/10.5281/zenodo.7672189
, DW 6Dec2023
Optimized SMRT-UMI protocol produces highly accurate sequence datasets
from diverse populations â application to HIV-1 quasispecies
These files support the analysis of dUMI samples described in Westfall et al.
\"Optimized SMRT-UMI protocol produces highly accurate sequence
datasets from diverse populations â application to HIV-1
quasispecies\". This research identified optimized conditions to
prevent error during PCR and the supporting software PORPIDpipeline
for sequence generation and filtering.
These files used as input for different pipelines and analyses.
Supplement Figure S4 in the manuscript shows the entire analysis
workflow and indicates which files are used as input for each step.
Included here is one dataset which can be run through the two pipelines (M027)
as well as the outputs from the sUMI_dUMI_comparison pipeline from the other four
datasets:
2021-04_UW_M027.fastq.gz
-CCS fastq.gz file from PacBio sequencing from M027 dataset
-Used as input to chunked_demu...
创建时间:
2023-12-08



