five

Organelle reads from Acacia pycnantha

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5542994
下载链接
链接失效反馈
官方服务:
资源简介:
Organelle reads from Acacia pycnantha, to accompany a manuscript. This manuscript assembles a mitochondrial genome (the mitome) and a chloroplast genome (the plastome).  In this repository, there are six files of sequencing reads. These were extracted from the full Acacia pycnantha data set, in NCBI under BioProject PRJNA752212. How were these reads extracted?  For full detail, see manuscript sections: Extraction and assembly of organelle-only Nanopore reads, round 1 and round 2, and Extraction of organelle-only Illumina reads, in https://www.biorxiv.org/content/10.1101/2020.12.22.423164v1  Summary: Extraction and assembly of organelle-only Nanopore reads, round 1 To extract the organelle-only reads from the full read sets, we used a set of known sequences from related taxa as “baits”. For the plastome, we used three coding sequences from Acacia ligulata (NC_026134.2) in FASTA nucleotide format. We chose the genes rbcL, matK and ndhF as these are all likely to be plastid-only genes and are also well conserved. The rbcL and matK genes are usually located at either end of the LSC region, and ndhF is usually in the SSC region; these are well spaced around the plastid so that long reads should be extracted with roughly even coverage. As the mitome is much larger than the plastome, we used all 38 of the coding sequences from the mitome  Acacia ligulata (NC_040998.1).  We mapped the raw Nanopore reads (~5.5 million) to the baits with minimap2 and used samtools to extract mapped reads. We then used Filtlong to keep only the longest of the extracted reads up to a coverage of X250, because assembly becomes more fragmented or not possible when coverage is too high (and preliminary tests confirmed this with our data). For the plastome, we extracted ~28,000 reads, downsampled to 901 reads, longest ~121 Kbp; for the mitome, we extracted ~14,000 reads, no downsampling as coverage did not meet cutoff (X250), longest ~105 Kbp. Extracted Nanopore reads were assembled with Flye and the assembly was polished with two rounds of Racon. Extraction and assembly of organelle-only Nanopore reads, round 2 We used this first assembly as the baits file for the next round of extracting organelle reads from the original full read set. In Minimap2, we set a minimum match value to 5000, as preliminary tests showed that more leniency here resulted in too many reads extracted to assemble properly.  Again we kept only the longest reads to a target coverage of X250.  From the ~5.5 million raw reads, for the plastome, we extracted ~70,000 reads (approx twice as many as in round 1), downsampled to 864 reads, longest ~121,000 bp (same as round 1); for the mitome, we extracted ~14,000 reads (similar to round 1), downsampled slightly to ~12,000 reads, longest ~105 Kbp (same as round 1). As in the first round, these reads were then assembled with Flye and polished with two rounds of Racon. In testing, further rounds of Racon polishing made little difference. These are read sets: mitome_nano_extracted_long2.fq.gz plastome_nano_extracted_long2.fq.gz   Extraction of organelle-only Illumina reads Using the Round 2 assembly as baits, we then extracted organelle-only reads from the filtered and trimmed Illumina reads (~410 million read pairs). The extracted read sets were then randomly downsampled to a coverage of X250 using Rasusa. For the plastome, this resulted in ~26 million read pairs, downsampled to ~130,000 read pairs; for the mitome, this resulted in ~23 million read pairs, downsampled to ~670,000 read pairs.  These are read sets: mitome_R1_extracted_subset.fq.gz mitome_R2_extracted_subset.fq.gz plastome_R1_extracted_subset.fq.gz plastome_R2_extracted_subset.fq.gz
创建时间:
2021-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作