five

Kotomiya07/kuzushiji-dataset-characters

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Kotomiya07/kuzushiji-dataset-characters
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ja license: cc-by-sa-4.0 size_categories: - 10K<n<100K task_categories: - image-classification tags: - kuzushiji - japanese - historical-documents - ocr - character-crops --- # Kuzushiji Character Dataset This dataset contains character crops generated directly from page images using raw character annotations. ## Dataset Description - **Number of character images**: 1,086,126 - **Number of books**: 44 - **Number of character categories**: 4,338 - **Crop source**: raw page image + annotation CSV - **Image size**: original cropped size (no resize) ## Dataset Structure ```python { "image": Image(), # Cropped character image "source_image_id": str, # Source page image ID "book_id": str, # Book ID "char_id": str, # Character annotation ID "block_id": str, # Block ID "category": str, # Unicode string (e.g., U+3042) "category_id": int, # Category ID "char": str, # Actual character "bbox": List[int], # Original bbox on the source page [x, y, w, h] "crop_bbox": List[int], # Clamped bbox used for cropping [x, y, w, h] "width": int, # Crop width in pixels "height": int, # Crop height in pixels } ```
提供机构:
Kotomiya07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作