tdro-llm/finetune_data
收藏Hugging Face2025-05-15 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/tdro-llm/finetune_data
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了用于微调基于大型语言模型的密集检索系统的25个异构检索微调数据集。每个数据集都包含了硬负样本和去重(带有测试集)的处理。数据集涵盖了新闻、网络集合、维基百科问答、医学等多个领域。数据集的详细信息包括语言、类别、对称性、参考文献、格式、硬负样本挖掘方法、大小、去重后的大小和副本数量。
This dataset includes 25 heterogeneous retrieval fine-tuning datasets for fine-tuning large language model-based dense retrieval systems. Each dataset is processed with hard negatives and deduplication (with test sets). The datasets cover various domains such as news, web collections, Wikipedia QA, medical, and more. Detailed information about each dataset includes language, category, symmetry, reference, format, hard negative mining approach, size, deduplicated size, and number of duplicates.
提供机构:
tdro-llm



