The recognition of motif boundaries.

NIAID Data Ecosystem2026-03-06 收录

下载链接：

https://figshare.com/articles/dataset/_The_recognition_of_motif_boundaries_/511251

下载链接

链接失效反馈

官方服务：

资源简介：

Counts were made of the deviations found by Program 1 and DIALIGN-TX of the left and right pattern boundaries (120 total) for the embedded motifs within the 60 IRM-1 sequence sets, divided into the sets involving 4, 8, and 16 sequences [96]. At all 120 boundaries of the reported patterns, both programs align in register at least 50% of the sequences. This consensus allows us to determine to what extent the programs report conserved regions that are too long or too short. Positive deviations in the table refer to patterns identified by the programs that are longer than the actual patterns. To make an equitable comparison of the two programs, several non-default options and procedures were employed, as follows: (1) Asymmetric affine gap costs were inappropriate for Program 1 because the Rose program [97] used in the construction of IRM-1 does not simulate the differential rates with which insertions and deletions occur within real protein motifs. Accordingly, we empirically assigned all gaps of length a score of bits, which corresponds [82] to an average frequency of 0.67% for insertions (and similarly for deletions) beginning at each motif position, and an average insertion or deletion length of . (2) We ran DIALIGN-TX at its least sensitive setting, using the “-l2” option, to avoid the excessive extensions into randomly aligned flanking sequences that degrade the accuracy of motif boundary recognition with the more sensitive default setting. (3) For DIALIGN-TX, we defined the boundary of a conserved motif as the maximum left or right extent to which all of the set of sequences aligned in register were reported as conserved. An alternative criterion might be to take a majority vote on the left or right extent of the reported pattern, but this criterion often gave unreasonably long extensions with DIALIGN-TX, and so was not used. For Program 1 run with the 16-sequence input sets, two outliers were found (columns headed and ). These are cases where roughly half the sequences in the set contained large insertions or deletions, leading Program 1 to misalign a substantial minority of sequences at one of the boundaries.

创建时间：

2010-07-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集