five

CIawevy/TextPecker-1.5M

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/CIawevy/TextPecker-1.5M
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - image-to-text dataset_info: features: - name: id dtype: string - name: images list: image - name: conversations list: - name: content dtype: string - name: role dtype: string - name: data_source dtype: string - name: class dtype: string - name: ori_bbox list: string splits: - name: test num_bytes: 986226411 num_examples: 1061 - name: train num_bytes: 984872941236 num_examples: 1482028 download_size: 985226675892 dataset_size: 985859167647 configs: - config_name: default data_files: - split: test path: data/test-* - split: train path: data/train-* --- # TextPecker-1.5M: A Dataset for Training and evaluating TextPecker This repository contains the **TextPecker-1.5M** dataset, a new benchmark proposed in the paper "[TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering](https://arxiv.org/abs/2602.20903)". ## Code and Project Page The official implementation and project details for the TextPecker and TextPecker-1.5M dataset can be found on the GitHub repository: [https://github.com/CIawevy/TextPecker](https://github.com/CIawevy/TextPecker) ## Sample Usage You can easily load the TextPecker-1.5M dataset using the Hugging Face `datasets` library. The dataset is provided in two configurations: `train` and `test` ```python from datasets import load_dataset # Load the full TextPecker-1.5M dataset (includes train and test splits) dataset = load_dataset("CIawevy/TextPecker-1.5M", "default") train_data = dataset["train"] test_data = dataset["test"] # Load specific split directly (more efficient for practical usage) train_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="train") test_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="test") ``` For detailed instructions on installation, model download, evaluation, and running demos, please refer to the [GitHub repository](https://github.com/CIawevy/TextPecker). ## Citation If you find this dataset useful for your research, please cite the accompanying paper: ```bibtex @article{zhu2026TextPecker, title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering}, author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang}, journal = {arXiv preprint arXiv:2602.20903}, year = {2026} } ```
提供机构:
CIawevy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作