Kotomiya07/kuzushiji-dataset-characters
收藏Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Kotomiya07/kuzushiji-dataset-characters
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ja
license: cc-by-sa-4.0
size_categories:
- 10K<n<100K
task_categories:
- image-classification
tags:
- kuzushiji
- japanese
- historical-documents
- ocr
- character-crops
---
# Kuzushiji Character Dataset
This dataset contains character crops generated directly from page images using raw character annotations.
## Dataset Description
- **Number of character images**: 1,086,126
- **Number of books**: 44
- **Number of character categories**: 4,338
- **Crop source**: raw page image + annotation CSV
- **Image size**: original cropped size (no resize)
## Dataset Structure
```python
{
"image": Image(), # Cropped character image
"source_image_id": str, # Source page image ID
"book_id": str, # Book ID
"char_id": str, # Character annotation ID
"block_id": str, # Block ID
"category": str, # Unicode string (e.g., U+3042)
"category_id": int, # Category ID
"char": str, # Actual character
"bbox": List[int], # Original bbox on the source page [x, y, w, h]
"crop_bbox": List[int], # Clamped bbox used for cropping [x, y, w, h]
"width": int, # Crop width in pixels
"height": int, # Crop height in pixels
}
```
提供机构:
Kotomiya07



