Document-Type-Detection
收藏魔搭社区2025-12-03 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Document-Type-Detection
下载链接
链接失效反馈官方服务:
资源简介:
# **Document-Type-Detection**
## Dataset Summary
The **Document-Type-Detection** dataset is a large-scale image classification dataset consisting of scanned or photographed document images. Each image is categorized into one of nine document types. This dataset is ideal for training document classification models in finance, administration, OCR, and automation workflows.
## Supported Tasks
- **Multiclass Document Classification**
Classify an input document image into one of the predefined document types.
- **Dataset Type:** Image Classification
- **Task:** Document Type Detection (Multiclass)
- **License:** Apache 2.0
Each document class is stored in its corresponding folder.
## Dataset Structure
| Feature | Type | Description |
| ------- | -------- | -------------------------------------- |
| image | Image | Document image (variable resolution) |
| label | Category | Integer from 0 to 8 representing class |
**Split:**
* `train`: Full dataset in a single training split.
## Example Usage
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Document-Type-Detection")
sample = dataset["train"][0]
image = sample["image"]
label = sample["label"]
```
## License
This dataset is distributed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
# 文档类型检测(Document-Type-Detection)
## 数据集概述
本**文档类型检测(Document-Type-Detection)**数据集是大规模图像分类数据集,由扫描或拍摄的文档图像构成。每张图像被归类至九种文档类型中的一种。该数据集是在金融、行政、光学字符识别(Optical Character Recognition, OCR)以及自动化工作流中训练文档分类模型的理想选择。
## 支持任务
- **多类别文档分类(Multiclass Document Classification)**
将输入的文档图像归类至预定义的文档类型中。
- **数据集类型:图像分类(Image Classification)**
- **任务:文档类型检测(多类别)(Document Type Detection (Multiclass))**
- **许可证:Apache 2.0**
每个文档类别均存储于对应的文件夹中。
## 数据集结构
| 特征名称 | 数据类型 | 描述说明 |
| ------- | -------- | -------------------------------------- |
| `image` | 图像(Image) | 文档图像(分辨率可变) |
| `label` | 类别(Category) | 取值范围为0至8的整数,用于表示对应类别 |
**数据集划分:**
* `train`(训练集):单训练划分包含完整数据集。
## 示例用法
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Document-Type-Detection")
sample = dataset["train"][0]
image = sample["image"]
label = sample["label"]
## 许可证
本数据集采用 [Apache 2.0许可证](https://www.apache.org/licenses/LICENSE-2.0) 进行分发。
提供机构:
maas
创建时间:
2025-05-10



