Supplementary materials to: Nano-Strainer: a workflow for identification of single-copy nuclear loci for plant systematic studies, using target capture kits and Oxford Nanopore long reads
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.2fqz612tm
下载链接
链接失效反馈官方服务:
资源简介:
In the paper associated with this dataset, a workflow is presented which
enables the identification of single-/low-copy nuclear molecular markers
for a plant group of interest, by mining data from a small representative
target capture experiment done using a commercial probe kit and Oxford
Nanopore long-read sequencing. The proposed pipeline first assesses
sequence variability contained in the data from targeted loci and assigns
reads to their respective genes, via a combined BLAST/clustering
procedure. Cluster consensus sequences are then examined based on four
pre-defined criteria presumably indicative for absence of paralogy. This
is done by calculating four specialized indices; loci are ranked according
to their performance in these indices, and top-scoring loci are considered
putatively single- or low-copy. The approach can be applied to any probe
set. As it relies on long reads, the contribution also provides template
workflows for processing Nanopore-based target capture data. Identified
loci can be used for NGS amplicon sequencing. For detection of possibly
remaining paralogy in these data, which might occur in groups with rampant
paralogy, the long-read assembly tool CANU is employed. The presented
workflow can be useful for researchers dealing with reticulate or
polyploidization phylogenetic histories in plants. The present dataset
contains several documents supplementing the original paper. Its most
important elements are a detailed description (alongside two graphical
workflow figures) of all methods employed in the study, suitable for
reproducing the steps of the workflow and also the wet-lab work. The
workflow employs a collection of BASH, Python and R scripts which is
available here, together with a detailed account on command line use in
Linux. Also, reference sequences for the identified markers can be found
as well as sequence alignments derived from the amplicon sequencing.
提供机构:
Dryad
创建时间:
2023-06-05



