five

nagohachi/japanese-str-dataset-test

收藏
Hugging Face2026-01-04 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nagohachi/japanese-str-dataset-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - image-to-text language: - ja tags: - ocr - webdataset size_categories: - 1M<n<10M --- # OCR Dataset Japanese OCR dataset in WebDataset format. ## Dataset Structure | Split | Samples | Shards | |-------|---------|--------| | train | 5,000,000 | 500 | | valid | 50,000 | 5 | | test | 50,000 | 5 | | **Total** | **5,100,000** | **510** | ## Usage ```python import webdataset as wds base_url = "https://huggingface.co/datasets/nagohachi/japanese-str-dataset-test/resolve/main" # Load train split train_dataset = ( wds.WebDataset(base_url + "/train/train-{00000..00499}.tar") .decode("pil") .to_tuple("png", "txt") ) for image, text in train_dataset: # image: PIL Image # text: str pass ``` ## Format Each sample contains: - `png`: Image file (PNG format) - `txt`: Ground truth text
提供机构:
nagohachi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作