five

MVFR: Multilingual Visual Font Recognition Synthetic Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/cnd2wh65my
下载链接
链接失效反馈
官方服务:
资源简介:
MVFR is a synthetic multilingual visual font recognition dataset featuring data from four common languages: Bangla, Hindi, Spanish, and Russian. The dataset creation process involved several steps. Initially, multiple lists of common words for all four languages were gathered from the open-source data science platform, Kaggle. Following this, the 10 most popular fonts for each language were sourced from various open-source font-sharing platforms. Subsequently, a data generator was developed using Python and the Pillow library to produce synthetic 400x200 white images containing words in the respective languages printed in different fonts. Each language in the dataset comprises 50,000 images in total, with 5,000 images generated for each of the 10 fonts. Additionally, the dataset includes the Python generator script that can facilitate the generation of visual font recognition data for other languages as well. Researchers can leverage both the MVFR dataset and the generator script to train and evaluate AI models for font recognition across multiple languages.
创建时间:
2024-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作