"Tree Similarity Search Dataset"
收藏DataCite Commons2026-03-31 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/tree-similarity-search-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"This comprehensive dataset provides a robust benchmarking suite for evaluating tree edit distance lower-bound filters within a similarity search. The collection comprises 21 diverse datasets, including seven real-world corpora spanning natural language processing, bioinformatics, bibliographic records, and source code structures. To facilitate deep algorithmic analysis, the repository also includes 14 synthetic datasets categorized into three controlled groups that isolate the effects of tree size, structural fanout, and label diversity on computational performance. All trees are encoded in a standard, compact bracket notation, ensuring accessibility for researchers. By providing high-selectivity query sets and varied tree topologies\u2014ranging from small sentiment parses to massive protein structures with over 20,000 nodes\u2014this dataset serves as a critical resource for developing and validating efficient similarity search algorithms over ordered labeled trees."
提供机构:
IEEE DataPort
创建时间:
2026-03-31



