docjay131/receipts-ocr-dataset
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/docjay131/receipts-ocr-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: text
dtype: string
splits:
- name: train
num_bytes: 34475137
num_examples: 220
download_size: 34374062
dataset_size: 34475137
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Receipt OCR Dataset
A dataset of receipt photos with structured JSON extraction labels for fine-tuning vision-language models on document OCR tasks.
## Dataset
220 receipt images labeled with structured JSON extracted via Gemini, covering a variety of merchants, formats, and receipt layouts.
## Format
| Column | Type | Description |
|--------|------|-------------|
| `image` | `Image` | Receipt photo (JPEG) |
| `text` | `string` | Extracted receipt data as JSON |
## JSON Schema
```json
{
"merchantName": "string",
"merchantAddress": "string or null",
"date": "YYYY-MM-DD",
"time": "HH:MM or null",
"receiptNumber": "string or null",
"items": [{"name": "string", "quantity": number, "unitPrice": number, "totalPrice": number}],
"subtotal": number,
"tax": number or null,
"tip": number or null,
"total": number,
"paymentMethod": "string or null",
"category": "string or null"
}
```
## Usage
```python
from datasets import load_dataset
ds = load_dataset("your-username/receipt-dataset", split="train")
print(ds[0]["text"]) # JSON string
ds[0]["image"].show() # PIL image
```
提供机构:
docjay131



