Raw sequence and Non-B-DNA occurrence datasets
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Raw_sequence_and_Non-B-DNA_occurrence_datasets/22759073
下载链接
链接失效反馈官方服务:
资源简介:
1.Sequence Data Collection:
We have extracted genomic regions [-500 to +500] relative to the translation start sites for 1180 cellular organisms from various public repositories. Archaea and bacteria datasets were retrieved from the NCBI database. Fungal datasets corresponding to Aspergillus, Candida, and Saccharomyces species were extracted from the Aspergillus Genome Database, the Candida Genome Database, and the Saccharomyces Genome databases. The UCSC genome browser was used to retrieve fly, mammalian, vertebrate, and worm species. Plant datasets were downloaded from the plant genome database. 1180 unique species belonging to three domains of life have been classified into 28 taxonomic groups loosely based on NCBI taxonomy.
2. Datasets for Non-B DNA motifs:
We computed six putative non-B DNA forming sequences using regular expression models (APR, DR, GQ, IR, MR, and Z) developed by Cer et al. non-B DNA Motif Search Tool (nBMST) in extracted genomic regions. TSV files derived from sequences of each genome were reported in the dataset.
1. Curved DNA motif (A-Phased Repeat, APR) constitutes 3 or more A-tracts (3–5 As) with 10 nucleotides (nt) pacing in the center of each tract.
2.Slipped DNA motifs were defined as 10–50nt direct repeats (DRs) with no intervening nucleotides. 3.G-quadruplexes (GQs) are screened as 4 or more runs of G-tracts (3–5 G’s) separated by 1–7nt spacers.
4.Cruciform DNA is defined as 10–100nt inverted repeats (IRs) with a small spacer size (0–3nt).
5.The triplex DNA motif constitutes10–100nt sized mirror repeats (MRs) with 0–8nt spacers. The sequence composition must be 90% or more purine or pyrimidine nucleotides.
6. The Z-DNA motif is predicted from G-Y runs where G is followed by Y (C or T) for at least 10nt with a strand having alternating Gs.
7. Short tandem repeats (STRs) were also included in the analysis for comparison with other motifs.
创建时间:
2023-05-04



