five

SoyVitou/KhmerSynthetic1M

收藏
Hugging Face2026-02-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/SoyVitou/KhmerSynthetic1M
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pretty_name: KhmerSynthetic1MZip (images embedded in Parquet) tags: - khmer - ocr - synthetic dataset_info: features: - name: id dtype: int32 - name: image dtype: image - name: label dtype: string - name: file_name dtype: string --- # KhmerSynthetic1M (Compressed) Synthetic Khmer OCR dataset (1,000,000 images) with labels. Images are renamed sequentially (`img_00000001.jpg`, …) and indexed by `metadata.parquet` for fast browsing in the Hugging Face data viewer. ## Contents - `compressed_1m_dataset/`: JPEG images - `compressed_1m_dataset/metadata.parquet`: manifest with columns: - `id`: integer row id - `image`: relative image filename - `img_path`: same as `image` (explicit for viewers) - `label`: ground-truth text - `compressed_1m_dataset.db`: SQLite (`generated_meta`) mirroring the manifest ## Download / Use ```python from datasets import load_dataset ds = load_dataset("SoyVitou/KhmerSynthetic1M", streaming=True) row = next(iter(ds["train"])) print(row["image"], row["label"]) ``` ## Generation notes - Rendered with multiple Khmer fonts (plus limited Latin), curved text augmentation, noise/lighting/brush/smudge effects. - Images compressed to reduce size (JPEG quality ~32, optional resize). - Filenames flattened/sequential for easier indexing. ## License Research and academic use only. Commercial use is not permitted. By using this dataset you agree to comply with these terms. ## Citation If you use this dataset in a paper, please cite: ``` @inproceedings{YourName2024KhmerSynthetic1M, title = {KhmerSynthetic1M: Large-Scale Synthetic Khmer OCR Dataset}, author = {Your Name and Coauthors}, booktitle = {Proceedings of ...}, year = {2024} } ``` ## Contact Issues / feedback: open a discussion on the Hugging Face dataset page.
提供机构:
SoyVitou
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作