five

HISTAI-metadata

收藏
魔搭社区2025-12-05 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/histai/HISTAI-metadata
下载链接
链接失效反馈
官方服务:
资源简介:
Dear researchers and engineers, you're accessing a dataset that would cost millions of dollars to build and took millions of nerves to negotiate favorable terms for its use. Your support, by liking the repositories and upvoting the collection, costs nothing but gives us valuable motivation to continue our contributions to the community. We reserve the right not to approve the request if you don't support our efforts. Thank you very much for collaboration! # HISTAI Dataset HISTAI is a comprehensive whole-slide image (WSI) pathological dataset spanning multiple medical specializations. Slides are anonymized and organized into specialized subsets by organ systems or pathology types. If you wish to support, sponsor, or obtain a commercial license for HISTAI data, please contact us at [models@hist.ai](mailto:models@hist.ai). For details refer to our report: * [HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology](https://arxiv.org/abs/2505.12120) This repository contains metadata and references to images hosted separately. Individual slide images are accessible from specialized Hugging Face datasets. --- ## Dataset Structure Slides are stored across specialized datasets hosted on Hugging Face. Each specialized dataset contains anonymized slides organized by cases: ``` histai/<dataset_name>/case_<case_id>/slide_<stain>_<slide_number>.tiff ``` or ``` histai/<dataset_name>/case_<case_id>/slide_<magnification>_<stain>_<slide_number>.tiff ``` Most of the slides are stained with Hematoxylin and Eosin (H&E) and scanned at 20X magnification. If a slide differs in magnification from 20X, this information is embedded in the slide filename, as shown above. Currently available specialized datasets: * [HISTAI-hematologic](https://huggingface.co/datasets/histai/HISTAI-hematologic) * [HISTAI-gastrointestinal](https://huggingface.co/datasets/histai/HISTAI-gastrointestinal) * [HISTAI-breast](https://huggingface.co/datasets/histai/HISTAI-breast) * [HISTAI-thorax](https://huggingface.co/datasets/histai/HISTAI-thorax) * [HISTAI-skin-b2](https://huggingface.co/datasets/histai/HISTAI-skin-b2) * [HISTAI-skin-b1](https://huggingface.co/datasets/histai/HISTAI-skin-b1) * [HISTAI-colorectal-b1](https://huggingface.co/datasets/histai/HISTAI-colorectal-b1) * [HISTAI-colorectal-b2](https://huggingface.co/datasets/histai/HISTAI-colorectal-b2) * [HISTAI-mixed](https://huggingface.co/datasets/histai/HISTAI-mixed) ## Metadata The master repository includes comprehensive metadata in JSON format for each slide/case, containing detailed pathological, clinical, and technical information: | Field | Description | Example | | ----------------- | -------------------------------------------- | ----------------------------------------------------------------------------- | | `diagnosis` | Incoming clinical notes | Benign skin neoplasms. | | `conclusion` | Final pathological conclusion | Intradermal melanocytic nevus of the skin. | | `diff_diagnosis` | Differential diagnostic notes (if available) | | | `micro_protocol` | Microscopic description | Skin: Intradermal melanocytic nevus of the skin. Microscopic description: ... | | `additional_info` | Any additional clinical/pathological notes | "A repeat review of the histological specimens was performed, including ... | | `age` | Patient age (years) | 37 | | `gender` | Patient gender | f | | `icd10` | ICD-10 classification | D22 | | `specialization` | Medical specialization or organ system | Skin | | `case_mapping` | Reference to slide images | histai/HISTAI-skin-b2/case\_13384 | | `grossing` | Gross examination details | "Head and neck: One fragment, measuring 2×4 mm, gray, firm, with ... | --- ## Statistics | Dataset | Total Slides | Total Cases | |----------------------------------------|-------------:|------------:| | histai/HISTAI-hematologic | 214 | 214 | | histai/HISTAI-gastrointestinal | 202 | 120 | | histai/HISTAI-breast | 1,925 | 1,692 | | histai/HISTAI-thorax | 829 | 657 | | histai/HISTAI-skin-b2 | 43,757 | 20,621 | | histai/HISTAI-skin-b1 | 7,710 | 1,778 | | histai/HISTAI-colorectal-b1 | 5,379 | 998 | | histai/HISTAI-colorectal-b2 | 94 | 62 | | histai/HISTAI-mixed | 52,691 | 21,137 | - **Total slides**: 112,801 - **Total cases**: 47,279 - **Slides at x40 magnification**: 2,463 - **Slides at x20 magnifications**: 110,338 - **H&E slides**: 92,536 - **IHC slides**: 16,920 - **Other stains**: 3,345 --- ## How to Access Images You can programmatically download slides using: ### Using Hugging Face Hub ```python from huggingface_hub import snapshot_download snapshot_download(repo_id="histai/<dataset_name>", repo_type="dataset", local_dir="/local_path") ``` ### Using Git ```bash # Ensure you have Git LFS installed git lfs install git clone https://huggingface.co/datasets/histai/<dataset_name> ``` ## License This dataset is licensed under **CC BY-NC 4.0** and is intended exclusively for **research purposes**. ## Citation Please cite the following if you use this dataset: ```bibtex @misc{nechaev2025histaiopensourcelargescaleslide, title={HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology}, author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova}, year={2025}, eprint={2505.12120}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2505.12120}, } ``` --- ## Contacts * **Authors:** Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova * **Emails:** dmitry@hist.ai, alex@hist.ai, kate@hist.ai

尊敬的研究者与工程师:您正在访问的数据集,其构建成本高达数百万美元,且为争取合理使用条款历经了大量磋商。您只需为仓库点赞、为数据集合集投出支持票,无需任何额外付出,却能为我们持续为社区贡献力量提供宝贵动力。若您未对我们的工作给予支持,我们保留拒绝您的使用请求的权利。感谢您的协作! # HISTAI 数据集 HISTAI 是一款覆盖多医学专科的综合性全切片图像(Whole Slide Image, WSI)病理数据集。所有切片均已匿名化处理,并按器官系统或病理类型划分为专业子数据集。 若您希望对HISTAI数据集提供资助、赞助或获取商业授权,请发送邮件至[models@hist.ai](mailto:models@hist.ai)与我们取得联系。 详细信息请参阅我们的研究报告:*[HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology](https://arxiv.org/abs/2505.12120)* 本仓库仅包含元数据与独立托管的图像引用链接,单张切片图像需从对应的Hugging Face专业数据集获取。 --- ## 数据集结构 切片存储于Hugging Face平台上的多个专业数据集中。每个专业数据集均按病例组织匿名化切片,文件命名格式如下: histai/<dataset_name>/case_<case_id>/slide_<stain>_<slide_number>.tiff 或 histai/<dataset_name>/case_<case_id>/slide_<magnification>_<stain>_<slide_number>.tiff 绝大多数切片采用苏木精-伊红(Hematoxylin and Eosin, H&E)染色,并以20倍放大倍率扫描。若切片的放大倍率非20倍,相关信息将嵌入切片文件名中,如上文所示。 当前已上线的专业数据集包括: * [HISTAI-hematologic](https://huggingface.co/datasets/histai/HISTAI-hematologic) * [HISTAI-gastrointestinal](https://huggingface.co/datasets/histai/HISTAI-gastrointestinal) * [HISTAI-breast](https://huggingface.co/datasets/histai/HISTAI-breast) * [HISTAI-thorax](https://huggingface.co/datasets/histai/HISTAI-thorax) * [HISTAI-skin-b2](https://huggingface.co/datasets/histai/HISTAI-skin-b2) * [HISTAI-skin-b1](https://huggingface.co/datasets/histai/HISTAI-skin-b1) * [HISTAI-colorectal-b1](https://huggingface.co/datasets/histai/HISTAI-colorectal-b1) * [HISTAI-colorectal-b2](https://huggingface.co/datasets/histai/HISTAI-colorectal-b2) * [HISTAI-mixed](https://huggingface.co/datasets/histai/HISTAI-mixed) ## 元数据 主仓库包含各切片/病例的JSON格式完整元数据,涵盖详细的病理、临床与技术信息: | 字段名 | 描述 | 示例 | | ----------------- | ------------------------------------- | -------------------------------------------------------------------------- | | `diagnosis` | 初步临床诊断记录 | 良性皮肤肿瘤。 | | `conclusion` | 最终病理诊断结论 | 皮肤真皮内黑素细胞痣。 | | `diff_diagnosis` | 鉴别诊断记录(若有提供) | | | `micro_protocol` | 镜下描述 | 皮肤:皮肤真皮内黑素细胞痣。镜下描述:... | | `additional_info` | 其他临床/病理相关记录 | "对组织学标本进行了复阅,包括..." | | `age` | 患者年龄(单位:岁) | 37 | | `gender` | 患者性别 | f | | `icd10` | ICD-10疾病分类编码 | D22 | | `specialization` | 所属医学专科或器官系统 | 皮肤 | | `case_mapping` | 切片图像引用链接 | histai/HISTAI-skin-b2/case_13384 | | `grossing` | 大体标本检查详情 | "头颈部:1块组织碎片,大小为2×4 mm,呈灰色、质地坚硬,..." | --- ## 统计信息 | 数据集名称 | 总切片数 | 总病例数 | |----------------------------------------|---------:|---------:| | histai/HISTAI-hematologic | 214 | 214 | | histai/HISTAI-gastrointestinal | 202 | 120 | | histai/HISTAI-breast | 1,925 | 1,692 | | histai/HISTAI-thorax | 829 | 657 | | histai/HISTAI-skin-b2 | 43,757 | 20,621 | | histai/HISTAI-skin-b1 | 7,710 | 1,778 | | histai/HISTAI-colorectal-b1 | 5,379 | 998 | | histai/HISTAI-colorectal-b2 | 94 | 62 | | histai/HISTAI-mixed | 52,691 | 21,137 | - **总切片数**:112,801 - **总病例数**:47,279 - **40倍放大切片数**:2,463 - **20倍放大切片数**:110,338 - **H&E染色切片数**:92,536 - **免疫组化(Immunohistochemistry, IHC)切片数**:16,920 - **其他染色切片数**:3,345 --- ## 图像获取方式 您可通过以下方式编程下载切片: ### 使用Hugging Face Hub python from huggingface_hub import snapshot_download snapshot_download(repo_id="histai/<dataset_name>", repo_type="dataset", local_dir="/local_path") ### 使用Git bash # 请确保已安装Git LFS git lfs install git clone https://huggingface.co/datasets/histai/<dataset_name> ## 授权协议 本数据集采用**CC BY-NC 4.0**协议进行授权,仅可用于**科研目的**。 ## 引用说明 若您使用本数据集,请引用以下文献: bibtex @misc{nechaev2025histaiopensourcelargescaleslide, title={HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology}, author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova}, year={2025}, eprint={2505.12120}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2505.12120}, } --- ## 联系方式 * **作者团队**:Dmitry Nechaev、Alexey Pchelnikov、Ekaterina Ivanova * **邮箱**:dmitry@hist.ai、alex@hist.ai、kate@hist.ai
提供机构:
maas
创建时间:
2025-05-15
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
HISTAI-metadata是一个全面的全切片图像病理学数据集,包含112,801张切片和47,279个病例,涵盖多个医学专科。数据集提供详细的元数据和多种子集,适用于计算病理学研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作