five

SISAP 2023 Indexing challenge –⁠ Learned Metric Index: Raw data, analyses, figures

收藏
doi.org2025-01-15 收录
下载链接:
http://doi.org/10.17632/3dp7jfv2vh.1
下载链接
链接失效反馈
官方服务:
资源简介:
==== For complete code, description, data, and steps to reproduce, visit: https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge ==== This repository contains the data for our submission to the SISAP 2023 Indexing challenge. We used a strip-down version of the Learned Metric Index (LMI), which is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation. **Getting started** Follow the instructions in README.md –⁠ https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge **Contents** 1. result/ - contains the raw .h5 files of each experiment (with varying hyperparameters), 2088 experiment in total 2. res.csv - contains the evaluation of every experiment (1 row) in terms of recall and query time 3. 02-Analyze-results.ipynb - Jupyter notebook used to analyze the results and plot the figures 4. cat.pdf, nobjects.pdf - figures used in the paper **Related Publications** > M. Antol, J. Ol'ha, T. Slanináková, V. Dohnal: [Learned Metric Index—Proposition of learned indexing for unstructured data](https://www.sciencedirect.com/science/article/pii/S0306437921000326?casa_token=EvG8iaWkqQUAAAAA:xgfbutrsNGcBXnTN-U4MQ65hgmPE3fAyzwqtijzGC-JRrkO1IYNmcN3A8yMsSOT3CCoHpqVtMA). Information Systems, 2021 - Elsevier (2021) > T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal: [Learned Metric Index—Proposition of learned indexing for unstructured data](https://link.springer.com/chapter/10.1007/978-3-030-89657-7_7). SISAP 2021 - Similarity Search and Applications pp 81-94 (2021) > J. Ol'ha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: [Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques](https://arxiv.org/abs/2208.08910), and [Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques](https://link.springer.com/chapter/10.1007/978-3-031-17849-8_22) SISAP 2022 - Similarity Search and Applications pp 274-282 (2022) > T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: [Reproducible experiments with Learned Metric Index Framework](https://www.sciencedirect.com/science/article/pii/S0306437923000911). Information Systems, Volume 118, September 2023, 102255 (2023) **Mendeley dataset**: https://data.mendeley.com/datasets/8wp73zxr47/12 ** Authors** - Terézia Slanináková, Masaryk University - David Procházka, Masaryk University - Jaroslav Oľha, Masaryk University - Matej Antol, Masaryk University - Vlastislav Dohnal, Masaryk University

本仓库收纳了我们提交至SISAP 2023索引挑战赛的数据。本研究采用了简化版的学习度量索引(LMI),该索引是一种基于机器学习和概率导航的近似最近邻搜索索引,适用于复杂数据。 **入门指南** 请参照README.md中的说明——https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge。 **内容概览** 1. result/文件夹 - 包含每个实验(具备不同超参数)的原始.h5文件,总计2088个实验。 2. res.csv文件 - 包含每个实验(每行一条记录)的评估结果,包括召回率和查询时间。 3. 02-Analyze-results.ipynb文件 - 用于分析结果和绘制图表的Jupyter笔记本。 4. cat.pdf, nobjects.pdf文件 - 论文中使用的图表。 **相关出版物** > M. Antol, J. Olha, T. Slanináková, V. Dohnal: [学习度量索引——学习索引的无结构数据方法](https://www.sciencedirect.com/science/article/pii/S0306437921000326?casa_token=EvG8iaWkqQUAAAAA:xgfbutrsNGcBXnTN-U4MQ65hgmPE3fAyzwqtijzGC-JRrkO1IYNmcN3A8yMsSOT3CCoHpqVtMA). 信息系统,2021 - Elsevier (2021) > T. Slanináková, M. Antol, J. Olha, V. Kaňa, V. Dohnal: [学习度量索引——学习索引的无结构数据方法](https://link.springer.com/chapter/10.1007/978-3-030-89657-7_7). SISAP 2021 - 相似性搜索与应用,第81-94页 (2021) > J. Olha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: [蛋白质中的学习索引:通过嵌入和聚类技术替代复杂距离计算的研究](https://arxiv.org/abs/2208.08910),及[蛋白质中的学习索引:通过嵌入和聚类技术替代复杂距离计算的方法](https://link.springer.com/chapter/10.1007/978-3-031-17849-8_22) SISAP 2022 - 相似性搜索与应用,第274-282页 (2022) > T. Slanináková, M. Antol, J. Olha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: [基于学习度量索引框架的可重复实验](https://www.sciencedirect.com/science/article/pii/S0306437923000911). 信息系统,第118卷,2023年9月,第102255号 (2023) **Mendeley数据集**:https://data.mendeley.com/datasets/8wp73zxr47/12 **作者** - Terézia Slanináková, 马萨里克大学 - David Procházka, 马萨里克大学 - Jaroslav Oľha, 马萨里克大学 - Matej Antol, 马萨里克大学 - Vlastislav Dohnal, 马萨里克大学
提供机构:
Mendeley Data
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作