Data to form periodic lossless ternary seeds of maximum weight (Part 1)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8370908
下载链接
链接失效反馈官方服务:
资源简介:
Data to form periodic lossless ternary seeds of maximum weight.
Detailed information can be found in the GitHub project (https://github.com/vtman/perlotSeeds). Codes to generate periodic blocks (binary and ternary) can also be found there.
Binary seeds can have only two symbols (0 = "do not care" = "_" or 1 = "match" = "#"). The length of a seed is the number of its elements, weight of a seed is the number of its 1-elements. The goal is to find seeds of maximum weight, so they can be used when there are two strings with a given number of mismatches. It is observed that in many cases these seeds of maximum weight have a periodic structure: the same block is repeated multiple times + its remainder. Blocks for binary seeds can be found with the help of the PerFSeeB project (https://github.com/vtman/PerFSeeB). These blocks have the maximum possible weight.
In genetics, we have four symbols in sequences (A, C, G, T). However, the chance of having a pointwise mutation is not the same for any pairs. A transition mutation (A ↔ G or C ↔ T) is often twice higher than a transversion mutation (A ↔ C, A ↔ T, G ↔ C, G ↔ T). Transition-constrained seeds use ternary alphabet {#, @, _} where @ is for a match or a transition mismatch. To generate ternary seeds, we first need to generate ternary blocks. These ternary blocks can be found when we use binary blocks. However, sometimes, we need to use binary blocks for less than the maximum weight.
BinaryDataLevel.zip contains binary blocks (mostly of maximum weight, but 1/5 are for smaller weights (less than one and a couple of blocks than two)).
Files T1V1.zip, T1V2.zip,..., and T8V1.zip contain ternary blocks in binary format. T4V2.zip and T7V2.zip are in the other dataset.
File bestTernary.zip contains ternary seeds of maximum weight (calculated as the number of # symbols + half of @ symbols)
创建时间:
2024-02-09



