SISAP 2023 Indexing challenge – Learned Metric Index: Raw data, analyses, figures
收藏doi.org2025-01-15 收录
下载链接:
http://doi.org/10.17632/3dp7jfv2vh.1
下载链接
链接失效反馈官方服务:
资源简介:
====
For complete code, description, data, and steps to reproduce, visit: https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge
====
This repository contains the data for our submission to the SISAP 2023 Indexing challenge. We used a strip-down version of the Learned Metric Index (LMI), which is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation.
**Getting started**
Follow the instructions in README.md – https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge
**Contents**
1. result/
- contains the raw .h5 files of each experiment (with varying hyperparameters), 2088 experiment in total
2. res.csv
- contains the evaluation of every experiment (1 row) in terms of recall and query time
3. 02-Analyze-results.ipynb
- Jupyter notebook used to analyze the results and plot the figures
4. cat.pdf, nobjects.pdf
- figures used in the paper
**Related Publications**
> M. Antol, J. Ol'ha, T. Slanináková, V. Dohnal: [Learned Metric Index—Proposition of learned indexing for unstructured data](https://www.sciencedirect.com/science/article/pii/S0306437921000326?casa_token=EvG8iaWkqQUAAAAA:xgfbutrsNGcBXnTN-U4MQ65hgmPE3fAyzwqtijzGC-JRrkO1IYNmcN3A8yMsSOT3CCoHpqVtMA). Information Systems, 2021 - Elsevier (2021)
> T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal: [Learned Metric Index—Proposition of learned indexing for unstructured data](https://link.springer.com/chapter/10.1007/978-3-030-89657-7_7). SISAP 2021 - Similarity Search and Applications pp 81-94 (2021)
> J. Ol'ha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: [Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques](https://arxiv.org/abs/2208.08910), and [Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques](https://link.springer.com/chapter/10.1007/978-3-031-17849-8_22) SISAP 2022 - Similarity Search and Applications pp 274-282 (2022)
> T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: [Reproducible experiments with Learned Metric Index Framework](https://www.sciencedirect.com/science/article/pii/S0306437923000911). Information Systems, Volume 118, September 2023, 102255 (2023)
**Mendeley dataset**: https://data.mendeley.com/datasets/8wp73zxr47/12
** Authors**
- Terézia Slanináková, Masaryk University
- David Procházka, Masaryk University
- Jaroslav Oľha, Masaryk University
- Matej Antol, Masaryk University
- Vlastislav Dohnal, Masaryk University
本仓库收纳了我们提交至SISAP 2023索引挑战赛的数据。本研究采用了简化版的学习度量索引(LMI),该索引是一种基于机器学习和概率导航的近似最近邻搜索索引,适用于复杂数据。
**入门指南**
请参照README.md中的说明——https://github.com/LearnedMetricIndex/LearnedMetricIndex/tree/paper-sisap23-indexing-challenge。
**内容概览**
1. result/文件夹
- 包含每个实验(具备不同超参数)的原始.h5文件,总计2088个实验。
2. res.csv文件
- 包含每个实验(每行一条记录)的评估结果,包括召回率和查询时间。
3. 02-Analyze-results.ipynb文件
- 用于分析结果和绘制图表的Jupyter笔记本。
4. cat.pdf, nobjects.pdf文件
- 论文中使用的图表。
**相关出版物**
> M. Antol, J. Olha, T. Slanináková, V. Dohnal: [学习度量索引——学习索引的无结构数据方法](https://www.sciencedirect.com/science/article/pii/S0306437921000326?casa_token=EvG8iaWkqQUAAAAA:xgfbutrsNGcBXnTN-U4MQ65hgmPE3fAyzwqtijzGC-JRrkO1IYNmcN3A8yMsSOT3CCoHpqVtMA). 信息系统,2021 - Elsevier (2021)
> T. Slanináková, M. Antol, J. Olha, V. Kaňa, V. Dohnal: [学习度量索引——学习索引的无结构数据方法](https://link.springer.com/chapter/10.1007/978-3-030-89657-7_7). SISAP 2021 - 相似性搜索与应用,第81-94页 (2021)
> J. Olha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: [蛋白质中的学习索引:通过嵌入和聚类技术替代复杂距离计算的研究](https://arxiv.org/abs/2208.08910),及[蛋白质中的学习索引:通过嵌入和聚类技术替代复杂距离计算的方法](https://link.springer.com/chapter/10.1007/978-3-031-17849-8_22) SISAP 2022 - 相似性搜索与应用,第274-282页 (2022)
> T. Slanináková, M. Antol, J. Olha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: [基于学习度量索引框架的可重复实验](https://www.sciencedirect.com/science/article/pii/S0306437923000911). 信息系统,第118卷,2023年9月,第102255号 (2023)
**Mendeley数据集**:https://data.mendeley.com/datasets/8wp73zxr47/12
**作者**
- Terézia Slanináková, 马萨里克大学
- David Procházka, 马萨里克大学
- Jaroslav Oľha, 马萨里克大学
- Matej Antol, 马萨里克大学
- Vlastislav Dohnal, 马萨里克大学
提供机构:
Mendeley Data



