five

abhinand/MedEmbed-training-triplets-v1

收藏
Hugging Face2024-10-21 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/abhinand/MedEmbed-training-triplets-v1
下载链接
链接失效反馈
官方服务:
资源简介:
MedEmbed数据集是一个专门用于医疗和临床数据的集合,旨在训练和评估与医疗相关的自然语言处理(NLP)任务中的嵌入模型,特别是信息检索。该数据集包含多种配置的医疗文本数据,包括语料库文本、查询-响应对和用于对比学习的合并数据。它特别支持MedEmbed嵌入模型系列的开发和评估。

The MedEmbed dataset is a specialized collection of medical and clinical data designed for training and evaluating embedding models in healthcare-related natural language processing (NLP) tasks, particularly information retrieval. This dataset contains various configurations such as corpus text, query-response pairs, and merged data for contrastive learning. It supports tasks like medical information retrieval, clinical question answering, and semantic search in medical contexts. The dataset is primarily in English and has been used to benchmark the MedEmbed-v0.1 models against general-purpose embedding models across various medical NLP tasks. The dataset is organized into four main configurations: corpus, default, merged, and queries, each with specific features and data fields. The dataset was created using a synthetic data generation pipeline involving clinical notes from PubMed Central and a model for generating query-response pairs. The dataset is released under the Apache 2.0 license.
提供机构:
abhinand
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作