five

LocalDoc/azerbaijani_retriever_corpus

收藏
Hugging Face2025-06-10 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/azerbaijani_retriever_corpus
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个大规模、高质量的阿塞拜疆语文本嵌入模型训练资源,用于信息检索任务。它包含671,528个训练实例,每个实例由一个查询、一个相关的正例文档和10个硬负例文档组成。该数据集的主要目标是促进使用对比学习训练密集检索模型。其关键特性是复杂的硬负例挖掘策略,旨在选择具有挑战性且合适的负例,以生成更健壮和准确的嵌入模型。

This dataset is a large-scale, high-quality resource designed for training Azerbaijani text embedding models for information retrieval tasks. It contains 671,528 training instances, each consisting of a query, a relevant positive document, and 10 hard-negative documents. The primary goal of this dataset is to facilitate the training of dense retriever models using contrastive learning. Its key feature is the sophisticated hard-negative mining strategy, which is designed to select challenging yet appropriate negative examples, leading to more robust and accurate embedding models.
提供机构:
LocalDoc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作