NorskHelsenett/eti-embedding-training-data-2048
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/NorskHelsenett/eti-embedding-training-data-2048
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含78,888个anchor-positive对,用于训练挪威语嵌入模型,重点关注健康相关内容。每个对由一个问题(anchor)和其对应的相关段落(positive)组成。数据来源于挪威公共健康内容,经过语义分块和问题生成处理。主要用于微调嵌入模型、训练双编码器和构建RAG系统。
This dataset contains 78,888 anchor-positive pairs for training Norwegian-language embedding models focused on health-related content. Each pair consists of a question (anchor) and its corresponding relevant passage (positive). The data is sourced from Norwegian public health content, processed through semantic chunking and question generation. It is primarily used for fine-tuning embedding models, training bi-encoders, and building RAG systems.
提供机构:
NorskHelsenett



