five

karoladelk/ham1ok

收藏
Hugging Face2024-04-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/karoladelk/ham1ok
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: image_id dtype: string - name: lesion_id dtype: string - name: dx dtype: class_label: names: '0': actinic_keratoses '1': melanoma '2': melanocytic_nevi '3': dermatofibroma '4': basal_cell_carcinoma '5': benign_keratosis-like_lesions '6': vascular_lesions - name: dx_type dtype: string - name: age dtype: float64 - name: sex dtype: string - name: localization dtype: string - name: labels dtype: class_label: names: '0': actinic_keratoses '1': melanoma '2': melanocytic_nevi '3': dermatofibroma '4': basal_cell_carcinoma '5': benign_keratosis-like_lesions '6': vascular_lesions splits: - name: train num_bytes: 2490501038.358 num_examples: 9577 - name: test num_bytes: 351507473.24 num_examples: 1285 - name: validation num_bytes: 681758880.144 num_examples: 2492 download_size: 3693626934 dataset_size: 3523767391.7419996 task_categories: - image-classification - image-segmentation language: - en tags: - skin_cancer - HAM10000 pretty_name: HAM10000 size_categories: - 1K<n<10K --- # The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions - Original Paper and Dataset [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T) - Kaggle dataset [here](https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000?resource=download) # Introduction to datasets Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The test set is not public, but the evaluation server remains running (see the challenge website). Any publications written using the HAM10000 data should be evaluated on the official test set hosted there, so that methods can be fairly compared. - Test site can be accessed [here](https://challenge.isic-archive.com/landing/2018/) # Disclaimer and additional information This is a contribution to open sourced data in hugging face for image data. Images can be obtained from above links. Train test split was done using a stratified splitting by cancer/diagnosis type. The code to stratify the dataset can be obtained on my github [here](https://github.com/marmal88/skin_cancer). I do not own any rights to above images. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
karoladelk
原始信息汇总

数据集概述

数据集名称

  • 名称: HAM10000
  • 别名: Human Against Machine with 10000 training images

数据集特征

  • 特征列表:
    • image: 图像数据
    • image_id: 字符串类型
    • lesion_id: 字符串类型
    • dx: 分类标签,包括以下类别:
      • 0: actinic_keratoses
      • 1: melanoma
      • 2: melanocytic_nevi
      • 3: dermatofibroma
      • 4: basal_cell_carcinoma
      • 5: benign_keratosis-like_lesions
      • 6: vascular_lesions
    • dx_type: 字符串类型
    • age: 浮点数类型
    • sex: 字符串类型
    • localization: 字符串类型
    • labels: 分类标签,类别同dx

数据集划分

  • 训练集: 9577个样本,大小为2490501038.358字节
  • 测试集: 1285个样本,大小为351507473.24字节
  • 验证集: 2492个样本,大小为681758880.144字节

数据集大小

  • 下载大小: 3693626934字节
  • 数据集大小: 3523767391.7419996字节

任务类别

  • 图像分类
  • 图像分割

语言

  • 英语

标签

  • 皮肤癌
  • HAM10000

数据集规模

  • 1K<n<10K
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作