LocalDoc/azerbaijani_retriever_corpus
收藏Hugging Face2025-06-10 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/azerbaijani_retriever_corpus
下载链接
链接失效反馈官方服务:
资源简介:
这是一个大规模、高质量的阿塞拜疆语文本嵌入模型训练资源,用于信息检索任务。它包含671,528个训练实例,每个实例由一个查询、一个相关的正例文档和10个硬负例文档组成。该数据集的主要目标是促进使用对比学习训练密集检索模型。其关键特性是复杂的硬负例挖掘策略,旨在选择具有挑战性且合适的负例,以生成更健壮和准确的嵌入模型。
This dataset is a large-scale, high-quality resource designed for training Azerbaijani text embedding models for information retrieval tasks. It contains 671,528 training instances, each consisting of a query, a relevant positive document, and 10 hard-negative documents. The primary goal of this dataset is to facilitate the training of dense retriever models using contrastive learning. Its key feature is the sophisticated hard-negative mining strategy, which is designed to select challenging yet appropriate negative examples, leading to more robust and accurate embedding models.
提供机构:
LocalDoc



