five

pbevan11/image_gen_ocr_evaluation_data

收藏
Hugging Face2024-04-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pbevan11/image_gen_ocr_evaluation_data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # image_gen_ocr_eval **Author:** Peter J. Bevan **Date:** 15/12/23 **github:** [https://github.com/pbevan1/image-gen-spelling-eval](https://github.com/pbevan1/image-gen-spelling-eval) --- *Table 1: Normalised Levenshtein similarity scores between instructed text and text present in image (as identified by OCR)* | Model | object | signage | natural | long | Overall | | --- | --- | --- | --- | --- | --- | | DALLE3 | 0.62 | 0.62 | 0.62 | 0.58 | 0.61 | | DeepFloydIF | 0.57 | 0.56 | 0.66 | 0.39 | 0.54 | | DALLE2 | 0.44 | 0.35 | 0.42 | 0.22 | 0.36 | | SDXL | 0.3 | 0.33 | 0.4 | 0.21 | 0.31 | | SD | 0.28 | 0.26 | 0.32 | 0.22 | 0.27 | | PlayGroundV2 | 0.19 | 0.23 | 0.17 | 0.2 | 0.2 | | Wuerstchen | 0.14 | 0.19 | 0.19 | 0.19 | 0.18 | | Kandinsky | 0.13 | 0.2 | 0.18 | 0.17 | 0.17 | --- This is a POC that calculates the normalised Levenshtein similarity between prompted text and the text present in the generated image (as recognised by OCR). To us this to create a metric, we create a dataset of prompts, each instructing to include some text in the image. We also provide a column for ground truth generated text which contains only the instructed text. The below scorer is then run on the generated images, comparing the target text with the actual text, outputting a score. The scores are then averaged to give a benchmark score. A score of 1 indicates a perfect match to the text. You can find the dataset at https://huggingface.co/datasets/pbevan11/image_gen_ocr_evaluation_data Since this metric solely looks at text within the generated images and not image quality as a whole, this metric should be used alongside other benchmarks such as those in https://karine-h.github.io/T2I-CompBench/. --- ![Image generation model spelling comparison](model_comparison.png) ``` @misc {peter_j._bevan_2024, author = { {Peter J. Bevan} }, title = { image_gen_ocr_evaluation_data (Revision 6182779) }, year = 2024, url = { https://huggingface.co/datasets/pbevan11/image_gen_ocr_evaluation_data }, doi = { 10.57967/hf/1944 }, publisher = { Hugging Face } } ```
提供机构:
pbevan11
原始信息汇总

image_gen_ocr_eval

作者: Peter J. Bevan

日期: 15/12/23

数据集链接: https://huggingface.co/datasets/pbevan11/image_gen_ocr_evaluation_data

数据集描述

该数据集用于计算生成图像中文字与指令文字之间的标准化Levenshtein相似度。数据集包含一系列提示,每个提示要求在图像中包含某些文字,并提供一个包含仅指令文字的基准生成文本列。通过比较目标文本与实际文本,输出一个分数,然后对这些分数进行平均,得到一个基准分数。分数为1表示文字完全匹配。

标准化Levenshtein相似度分数表

以下是不同模型在不同场景下的标准化Levenshtein相似度分数:

模型 物体 标志 自然 长文本 总体
DALLE3 0.62 0.62 0.62 0.58 0.61
DeepFloydIF 0.57 0.56 0.66 0.39 0.54
DALLE2 0.44 0.35 0.42 0.22 0.36
SDXL 0.3 0.33 0.4 0.21 0.31
SD 0.28 0.26 0.32 0.22 0.27
PlayGroundV2 0.19 0.23 0.17 0.2 0.2
Wuerstchen 0.14 0.19 0.19 0.19 0.18
Kandinsky 0.13 0.2 0.18 0.17 0.17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作