Algorithms for determining transposable genes in a genome
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.9zw3r22j3
下载链接
链接失效反馈官方服务:
资源简介:
Transposons are nucleotide sequences in DNA that can change their positions. Many transposons are shorter than a general gene. When we restrict to nucleotide sequences that form complete genes, we can still find genes that change their relative locations in a genome. Thus for different individuals of the same species, the orders of genes might be different. A practical problem is to determine such transposable genes in given gene sequences. Through an intuitive rule, we transform the biological problem of determining transposable genes into a rigorous mathematical problem of determining the longest common subsequence. Depending on whether the gene sequence is linear (each sequence has a fixed head and tail) or circular (we can choose any gene as the head, and the previous one is the tail), and whether genes have multiple copies, we classify the problem of determining transposable genes into four scenarios: (1) linear sequences without duplicated genes; (2) circular sequences without duplicated genes; (3) linear sequences with duplicated genes; (4) circular sequences with duplicated genes. With the help of graph theory, we design fast algorithms for different scenarios. Specifically, we study the situation where the longest common subsequence is not unique.
This dataset contains code files for the corresponding algorithms. Besides, it has gene sequence data for certain Escherichia coli strains (from NCBI), which are used to test those algorithms.
Methods
Gene sequences are from NCBI database.
创建时间:
2022-11-28



