MVFR: Multilingual Visual Font Recognition Synthetic Dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/cnd2wh65my
下载链接
链接失效反馈官方服务:
资源简介:
MVFR is a synthetic multilingual visual font recognition dataset featuring data from four common languages: Bangla, Hindi, Spanish, and Russian. The dataset creation process involved several steps. Initially, multiple lists of common words for all four languages were gathered from the open-source data science platform, Kaggle. Following this, the 10 most popular fonts for each language were sourced from various open-source font-sharing platforms. Subsequently, a data generator was developed using Python and the Pillow library to produce synthetic 400x200 white images containing words in the respective languages printed in different fonts. Each language in the dataset comprises 50,000 images in total, with 5,000 images generated for each of the 10 fonts. Additionally, the dataset includes the Python generator script that can facilitate the generation of visual font recognition data for other languages as well. Researchers can leverage both the MVFR dataset and the generator script to train and evaluate AI models for font recognition across multiple languages.
创建时间:
2024-01-29



