five

NorskHelsenett/eti-embedding-training-data-2048

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/NorskHelsenett/eti-embedding-training-data-2048
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含78,888个anchor-positive对,用于训练挪威语嵌入模型,重点关注健康相关内容。每个对由一个问题(anchor)和其对应的相关段落(positive)组成。数据来源于挪威公共健康内容,经过语义分块和问题生成处理。主要用于微调嵌入模型、训练双编码器和构建RAG系统。

This dataset contains 78,888 anchor-positive pairs for training Norwegian-language embedding models focused on health-related content. Each pair consists of a question (anchor) and its corresponding relevant passage (positive). The data is sourced from Norwegian public health content, processed through semantic chunking and question generation. It is primarily used for fine-tuning embedding models, training bi-encoders, and building RAG systems.
提供机构:
NorskHelsenett
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作