---
license: cc-by-nc-sa-4.0
task_categories:
- image-segmentation
language:
- en
tags:
- medical
- biology
pretty_name: CholecSeg8k
size_categories:
- 1K<n<10K
---
# Description:
[paper](https://arxiv.org/abs/2012.12453) | [kaggle](https://www.kaggle.com/datasets/newslab/cholecseg8k)
The CholecSeg8k dataset, an extension of the Cholec80 collection, includes 8,080 carefully annotated images from laparoscopic cholecystectomy surgeries, selected from 17 video clips in Cholec80. Each image in CholecSeg8K is pixel-level annotated for thirteen different surgical elements. The dataset is efficiently organized in a directory structure, featuring 101 folders, each containing 80 frames at a resolution of 854x480, along with three types of masks for each frame: a color mask for visualization, an annotation tool mask, and a watershed mask for simplified processing. This comprehensive dataset, freely available under the CC BY-NC-SA 4.0 license, is a critical resource for advancing the field of computer-assisted surgical procedures.
# Loading the data:
First install the `datasets` library, then run the following code,
```python
from datasets import load_dataset
dataset = load_dataset("minwoosun/CholecSeg8k", trust_remote_code=True)
```
# Simple demo:
This short demo shows how to load the data and directly visualize an image along with the corresponding masks.
```python
from datasets import load_dataset
import matplotlib.pyplot as plt
dataset = load_dataset("minwoosun/CholecSeg8k", trust_remote_code=True)
def display_image(dataset, image_index):
'''Display the image and corresponding three masks.'''
fig, axs = plt.subplots(2, 2, figsize=(10, 10))
for ax in axs.flat:
ax.axis('off')
# Display each image in its respective subplot
axs[0, 0].imshow(dataset['train'][image_index]['image'])
axs[0, 1].imshow(dataset['train'][image_index]['color_mask'])
axs[1, 0].imshow(dataset['train'][image_index]['watershed_mask'])
axs[1, 1].imshow(dataset['train'][image_index]['annotation_mask'])
# Adjust spacing between images
plt.subplots_adjust(wspace=0.01, hspace=-0.6)
plt.show()
display_image(dataset, 800) # video index from 0 to 8079
```

# Citation (BibTex):
```
@misc{hong2020cholecseg8k,
title={CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80},
author={W. -Y. Hong and C. -L. Kao and Y. -H. Kuo and J. -R. Wang and W. -L. Chang and C. -S. Shih},
year={2020},
eprint={2012.12453},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# Data card contact:
Min Woo Sun (minwoos@stanford.edu)
许可证:CC BY-NC-SA 4.0(知识共享署名-非商业性使用-相同方式共享4.0)
任务类别:
- 图像分割(image-segmentation)
语言:
- 英语(en)
标签:
- 医疗
- 生物学
友好名称:CholecSeg8k
样本规模类别:
- 1K<n<10K(千级至万级样本)
---
# 数据集描述:
[论文](https://arxiv.org/abs/2012.12453) | [Kaggle数据集页面](https://www.kaggle.com/datasets/newslab/cholecseg8k)
CholecSeg8k数据集是Cholec80数据集合集的扩展版本,包含从Cholec80的17段视频片段中遴选的8080张经过精细标注的腹腔镜胆囊切除术手术图像。CholecSeg8k中的每张图像均针对13种不同手术结构进行了像素级标注。该数据集采用高效的目录结构进行组织,共包含101个文件夹,每个文件夹内存有80张分辨率为854×480的帧图像,且每张帧配有三类掩码:用于可视化的彩色掩码、标注工具掩码,以及用于简化处理的分水岭掩码(watershed mask)。本数据集遵循CC BY-NC-SA 4.0协议免费开放,是推动计算机辅助外科手术领域发展的关键资源。
# 数据加载方法:
首先安装`datasets`库,随后运行如下代码:
python
from datasets import load_dataset
dataset = load_dataset("minwoosun/CholecSeg8k", trust_remote_code=True)
# 简易演示:
本简短演示展示如何加载数据集并直接可视化图像及其对应掩码。
python
from datasets import load_dataset
import matplotlib.pyplot as plt
dataset = load_dataset("minwoosun/CholecSeg8k", trust_remote_code=True)
def display_image(dataset, image_index):
'''可视化图像及其对应的三类掩码。'''
fig, axs = plt.subplots(2, 2, figsize=(10, 10))
for ax in axs.flat:
ax.axis('off')
# 在对应子图中展示每张图像
axs[0, 0].imshow(dataset['train'][image_index]['image'])
axs[0, 1].imshow(dataset['train'][image_index]['color_mask'])
axs[1, 0].imshow(dataset['train'][image_index]['watershed_mask'])
axs[1, 1].imshow(dataset['train'][image_index]['annotation_mask'])
# 调整图像间距
plt.subplots_adjust(wspace=0.01, hspace=-0.6)
plt.show()
display_image(dataset, 800) # 视频索引范围为0至8079

# 引用(BibTex格式):
@misc{hong2020cholecseg8k,
title={CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80},
author={W. -Y. Hong and C. -L. Kao and Y. -H. Kuo and J. -R. Wang and W. -L. Chang and C. -S. Shih},
year={2020},
eprint={2012.12453},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
# 数据集卡片联系人:
Min Woo Sun(电子邮箱:minwoos@stanford.edu)