CIawevy/TextPecker-1.5M

Name: CIawevy/TextPecker-1.5M
Creator: CIawevy
Published: 2026-03-20 14:51:03
License: 暂无描述

Hugging Face2026-03-20 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/CIawevy/TextPecker-1.5M

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - image-to-text dataset_info: features: - name: id dtype: string - name: images list: image - name: conversations list: - name: content dtype: string - name: role dtype: string - name: data_source dtype: string - name: class dtype: string - name: ori_bbox list: string splits: - name: test num_bytes: 986226411 num_examples: 1061 - name: train num_bytes: 984872941236 num_examples: 1482028 download_size: 985226675892 dataset_size: 985859167647 configs: - config_name: default data_files: - split: test path: data/test-* - split: train path: data/train-* --- # TextPecker-1.5M: A Dataset for Training and evaluating TextPecker This repository contains the **TextPecker-1.5M** dataset, a new benchmark proposed in the paper "[TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering](https://arxiv.org/abs/2602.20903)". ## Code and Project Page The official implementation and project details for the TextPecker and TextPecker-1.5M dataset can be found on the GitHub repository: [https://github.com/CIawevy/TextPecker](https://github.com/CIawevy/TextPecker) ## Sample Usage You can easily load the TextPecker-1.5M dataset using the Hugging Face `datasets` library. The dataset is provided in two configurations: `train` and `test` ```python from datasets import load_dataset # Load the full TextPecker-1.5M dataset (includes train and test splits) dataset = load_dataset("CIawevy/TextPecker-1.5M", "default") train_data = dataset["train"] test_data = dataset["test"] # Load specific split directly (more efficient for practical usage) train_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="train") test_data = load_dataset("CIawevy/TextPecker-1.5M", "default", split="test") ``` For detailed instructions on installation, model download, evaluation, and running demos, please refer to the [GitHub repository](https://github.com/CIawevy/TextPecker). ## Citation If you find this dataset useful for your research, please cite the accompanying paper: ```bibtex @article{zhu2026TextPecker, title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering}, author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang}, journal = {arXiv preprint arXiv:2602.20903}, year = {2026} } ```

提供机构：

CIawevy

5,000+

优质数据集

54 个

任务类型

进入经典数据集