five

HoneyBee

收藏
arXiv2024-05-13 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Lab-Rasool/TCGA
下载链接
链接失效反馈
资源简介:
HoneyBee是一个用于创建多模态肿瘤学数据集的可扩展模块化框架,由莫菲特癌症中心开发。该数据集整合了临床记录、影像数据和患者结果等多种数据模态,通过基础模型生成代表性嵌入。数据集大小庞大,包含来自TCGA项目的11,428名患者的数据,涵盖33种癌症类型。创建过程中,利用了先进的数据预处理技术和基于变压器的架构来生成嵌入,捕捉原始医疗数据中的基本特征和关系。HoneyBee旨在通过提供高质量、机器学习就绪的数据集,加速肿瘤学研究,解决医疗数据复杂性和异质性的挑战,并可扩展到其他医疗领域。

HoneyBee is a scalable and modular framework for creating multimodal oncology datasets, developed by the Moffitt Cancer Center. The datasets generated via this framework integrate multiple data modalities including clinical records, imaging data and patient outcomes, and generate representative embeddings through foundation models. With a massive scale, the datasets contain data from 11,428 patients across 33 cancer types sourced from The Cancer Genome Atlas (TCGA) program. During the dataset creation process, advanced data preprocessing techniques and transformer-based architectures are utilized to generate embeddings that capture the underlying features and relationships within raw medical data. HoneyBee aims to accelerate oncology research by providing high-quality, machine learning-ready datasets, address the challenges posed by the complexity and heterogeneity of medical data, and can be extended to other healthcare domains.
提供机构:
莫菲特癌症中心
创建时间:
2024-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作