QT-MSTR: A Multilingual Scene Text Annotation Dataset for the Qinghai-Tibet Region

Name: QT-MSTR: A Multilingual Scene Text Annotation Dataset for the Qinghai-Tibet Region
Creator: 西北民族大学; Zhuoma Tso; jia yang ji
Published: 2025-11-05 00:00:00
License: 暂无描述

科学数据银行2025-11-05 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=fc1afdb360ba4d628453e5177d46e072

下载链接

链接失效反馈

官方服务：

资源简介：

The QT-MSTR dataset is a text detection and recognition dataset focused on multi-lingual scenes in the Qinghai-Tibet Plateau region of China. It aims to provide high-quality benchmark data for research areas such as Tibetan OCR, multi-lingual scene text recognition, and low-resource language processing through real-world street-view images. Data were collected between 2020 and 2023, covering key urban areas in the Qinghai-Tibet region, including Xining and Haidong in Qinghai Province, Gannan Tibetan Autonomous Prefecture and Tianzhu Tibetan Autonomous County in Gansu Province, as well as Lhasa in the Tibet Autonomous Region. The collection focused on public spaces where multi-lingual text commonly appears, such as commercial streets, tourist service points, transportation hubs, and areas around public facilities, to accurately reflect the "Tibetan-Chinese-English" multilingual environment of the region. Data were captured using mainstream smartphone rear cameras and portable digital cameras under natural lighting conditions, with all images saved at their original resolution (primarily 4032×3024 pixels). In terms of data processing, we established a standardized annotation pipeline. First, all images underwent strict privacy protection processing, with faces and license plates that could involve personal identity information being blurred. Subsequently, annotators proficient in Tibetan, Chinese, and English performed initial annotations using the LabelMe tool. The annotation content includes not only precise bounding boxes (quadrilateral annotations) for text lines but also language information (Tibetan, Chinese, English, numeric, or mixed text) and the corresponding transcribed text. To strictly control data quality, we implemented a dual process of automated script validation and expert review, focusing on checking the structural integrity of JSON files, the validity of bounding boxes, and the accuracy of language tags, with manual emphasis on reviewing ambiguous samples identified by the automated process. The final dataset consists of 1,000 original images and exactly 1,000 paired annotation files in JSON format. Each data file is named according to the "QT[category]_[sequence number]" rule (e.g., `QTdor_001.jpg` and `QTdor_001.json`), ensuring a one-to-one correspondence between images and annotations. The annotation files adopt a standard structure that clearly defines the geometric location, language attribute, and text content of each text instance in the image. The dataset is complete, with no missing values or invalid samples. Potential errors introduced during the annotation process mainly stem from text blurring under extreme lighting or partial occlusion in complex backgrounds; the bounding box annotations for such samples have all been reviewed by experts to ensure overall annotation accuracy. The dataset uses common .jpg (image) and .json (annotation) formats and can be read and processed using any deep learning framework (such as PyTorch or TensorFlow) and common annotation tools that support these formats, with no need for specific niche software.

提供机构：

西北民族大学; Zhuoma Tso; jia yang ji

创建时间：

2025-11-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集