five

CT_DeepLesion-MedSAM2|医学影像分析数据集|病变检测数据集

收藏
huggingface2025-04-07 更新2025-04-08 收录
医学影像分析
病变检测
下载链接:
https://huggingface.co/datasets/wanglab/CT_DeepLesion-MedSAM2
下载链接
链接失效反馈
资源简介:
CT_DeepLesion-MedSAM2数据集包含来自10594项研究的32735个不同病变的32120个CT切片,这些研究涉及4427名独特患者。每个病变在其关键切片上都有一个边界框注释,由最长直径和最长垂直直径导出。此外,使用MedSAM2在人类参与的工作流程中对5000个病变进行了注释。
创建时间:
2025-04-03
AI搜集汇总
数据集介绍
main_image_url
构建方式
在医学影像分析领域,精准的病灶标注对深度学习模型训练至关重要。CT_DeepLesion-MedSAM2数据集基于DeepLesion原始CT影像库,通过人机协同标注流程对5,000个病灶进行了精细化标注。研究团队采用MedSAM2智能标注系统,结合放射科医师的专业复核,在32,120张CT切片中筛选关键病灶切片,依据病灶最长径和垂直径生成边界框标注,最终构建出兼顾规模与质量的医学影像数据集。
特点
该数据集显著优势在于其多中心来源的多样性,涵盖4,427名患者的32,735个病灶实例,病变类型覆盖全身各器官系统。每个样本不仅包含原始CT影像数据,还附带经专业验证的边界框标注,为病灶检测任务提供可靠基准。数据集特别强化了3D医学影像的标注维度,通过MedSAM2算法实现了从二维切片到三维体积的标注扩展,为医学图像分割研究提供了更丰富的空间信息。
使用方法
研究者可通过HuggingFace平台便捷获取该数据集,使用标准datasets库即可完成加载与访问。典型工作流程包括安装依赖库、下载数据集、划分训练验证集等步骤。数据集已预处理好CT影像与对应标注的映射关系,用户可直接提取图像-标注对进行模型训练。为保障学术规范性,使用该数据时需同时引用DeepLesion原始论文和MedSAM2方法论文,相关引用格式已在文档中明确提供。
背景与挑战
背景概述
CT_DeepLesion-MedSAM2数据集是医学影像分析领域的重要资源,由多伦多大学健康网络AI协作中心、哈佛医学院生物医学信息学系以及多伦多大学计算机科学系等机构的联合研究团队于2025年构建。该数据集基于美国国立卫生研究院(NIH)发布的DeepLesion数据集,通过MedSAM2模型在人工参与流程中对5000个病灶进行了精细化标注。其核心研究问题聚焦于三维医学图像中通用病灶检测与分割,为深度学习模型在CT影像分析中的泛化能力提供了关键数据支持。该数据集的建立显著推动了跨模态医学图像分割技术的发展,并为肿瘤定量分析、疗效评估等临床研究提供了标准化基准。
当前挑战
在医学影像分析领域,CT_DeepLesion-MedSAM2数据集致力于解决多类别病灶的精确分割与三维重建难题。主要挑战包括病灶形态的高度异质性、CT图像中组织边界模糊导致的标注歧义,以及小尺寸病灶在切片间的连续性保持问题。数据构建过程中,研究团队面临原始标注粒度不足的局限,需通过半自动标注与人工校验相结合的方式提升标注质量。跨机构协作带来的数据标准化问题,以及不同扫描设备产生的图像分辨率差异,均为数据集的构建增加了技术复杂度。如何平衡标注精度与大规模数据处理效率,成为该数据集持续优化的关键瓶颈。
常用场景
经典使用场景
在医学影像分析领域,CT_DeepLesion-MedSAM2数据集为深度学习模型提供了丰富的CT扫描图像及其标注,特别是针对病灶检测和分割任务。该数据集包含了来自4,427名患者的32,735个病灶的标注,覆盖了多种病灶类型,为模型训练提供了多样化的样本。经典使用场景包括病灶检测算法的开发与评估,以及医学影像分割模型的训练与验证。
实际应用
在实际应用中,CT_DeepLesion-MedSAM2数据集被广泛用于开发智能诊断系统,辅助放射科医生进行病灶检测和分割。这些系统能够快速识别CT扫描中的病灶,减少人工标注的工作量,提高诊断效率。此外,该数据集还被用于医学影像分析软件的开发,帮助医生进行更精确的病灶定位和评估。
衍生相关工作
基于CT_DeepLesion-MedSAM2数据集,研究者们开发了多种先进的医学影像分析模型,如MedSAM2,该模型在3D医学图像分割任务中表现出色。此外,该数据集还催生了一系列相关研究,包括病灶检测算法的优化、多模态医学影像分析以及病灶生长预测模型的开发,推动了医学影像分析领域的快速发展。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

HIT-UAV

HIT-UAV数据集包含2898张红外热成像图像,这些图像从43,470帧无人机拍摄的画面中提取。数据集涵盖了多种场景,如学校、停车场、道路和游乐场,在不同的光照条件下,包括白天和夜晚。

github 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

中国交通事故深度调查(CIDAS)数据集

交通事故深度调查数据通过采用科学系统方法现场调查中国道路上实际发生交通事故相关的道路环境、道路交通行为、车辆损坏、人员损伤信息,以探究碰撞事故中车损和人伤机理。目前已积累深度调查事故10000余例,单个案例信息包含人、车 、路和环境多维信息组成的3000多个字段。该数据集可作为深入分析中国道路交通事故工况特征,探索事故预防和损伤防护措施的关键数据源,为制定汽车安全法规和标准、完善汽车测评试验规程、

北方大数据交易中心 收录

Traditional-Chinese-Medicine-Dataset-SFT

该数据集是一个高质量的中医数据集,主要由非网络来源的内部数据构成,包含约1GB的中医各个领域临床案例、名家典籍、医学百科、名词解释等优质内容。数据集99%为简体中文内容,质量优异,信息密度可观。数据集适用于预训练或继续预训练用途,未来将继续发布针对SFT/IFT的多轮对话和问答数据集。数据集可以独立使用,但建议先使用配套的预训练数据集对模型进行继续预训练后,再使用该数据集进行进一步的指令微调。数据集还包含一定比例的中文常识、中文多轮对话数据以及古文/文言文<->现代文翻译数据,以避免灾难性遗忘并加强模型表现。

huggingface 收录

ISIC 2018

ISIC 2018数据集包含2594张皮肤病变图像,用于皮肤癌检测任务。数据集分为训练集、验证集和测试集,每张图像都附有详细的元数据,包括病变类型、患者年龄、性别和解剖部位等信息。

challenge2018.isic-archive.com 收录