five

iapp/thai_handwriting_dataset

收藏
Hugging Face2024-11-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/iapp/thai_handwriting_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-to-image - image-to-text language: - th tags: - handwriting-recognition - ocr pretty_name: Thai Handwriting Dataset size_categories: - 10K<n<100K maintainer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th) dataset_info: features: - name: image dtype: image - name: text dtype: string - name: label_file dtype: string --- # Thai Handwriting Dataset This dataset combines two major Thai handwriting datasets: 1. BEST 2019 Thai Handwriting Recognition dataset (train-0000.parquet) 2. Thai Handwritten Free Dataset by Wang (train-0001.parquet onwards) ## Maintainer kobkrit@iapp.co.th ## Dataset Description ### BEST 2019 Dataset Contains handwritten Thai text images along with their ground truth transcriptions. The images have been processed and standardized for machine learning tasks. ### Wang Dataset - Exclusively focuses on handwritten sentences in Thai language - Contains 4,920 unique sentences covering various topics and themes - Created from contributions by 2,026 users, ensuring diverse handwriting styles - Encompasses various linguistic patterns, vocabulary, and sentence structures ## Dataset Structure The dataset is provided in parquet file format with the following columns: - `image`: Image data (Image type) - `text`: Ground truth transcription of the handwritten text (String) - `label_file`: Source label file name (String) ## Usage This dataset is ideal for: - Handwriting recognition - Optical character recognition (OCR) - Natural language processing (NLP) - Language generation Researchers, developers, and enthusiasts can utilize this dataset to: - Develop and benchmark algorithms - Train machine learning models - Explore innovative techniques in Thai language analysis and handwriting recognition ## Original Datasets 1. BEST 2019 Thai Handwriting Recognition competition: https://thailang.nectec.or.th/best/best2019-handwrittenrecognition-trainingset/ 2. Thai Handwritten Free Dataset by Wang: Data Market https://www.wang.in.th/dataset/64abb3e951752d79380663c2
提供机构:
iapp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作