Amharic visual question answering on Ethiopian tourism

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13941891

下载链接

链接失效反馈

官方服务：

资源简介：

Visual Question Answering (VQA) is a Vision-to-Text (V2T) task that integrates visual features of images with natural language questions to generate meaningful responses. Most existing research has focused on English, leaving a significant gap for other languages, including Amharic. Tourism, a major global industry, relies heavily on interactions where visitors seek information about natural, historical, cultural, and religious sites. Ethiopia is a remarkable tourist destination, home to unique sites such as the Rock-hewn churches of Lalibela and the Castles of Gondar, as well as natural phenomena like Simien National Park and Lake Tana. Most visitors are local, creating an urgent need for a VQA model that can deliver accurate, culturally relevant information in Amharic. Unfortunately, no such model currently exists to assist tourists at these heritage sites. This research addresses this gap by developing an Amharic Visual Question Answering model specifically tailored for Ethiopian tourism. A new Amharic VQA dataset was created using 2,200 diverse images from Ethiopian tourist sites paired with 6,600 questions in Amharic, covering natural landmarks, historical sites, and religious celebrations. Our dataset is collected from various sources, including the UNICCO website, the Amhara Tourism office, and online platforms such as Facebook, Free pixel, and Instagram. Each image is complemented by three corresponding questions formulated by three individual experts and answered by ten candidates. The questions, answers, and images are linked through annotations and fed into the model. We used ResNet-50 for feature extraction and Bidirectional Gated Recurrent Unit (BiGRU) with attention mechanisms, achieving a testing accuracy of 54.98%, demonstrating the model's effectiveness in answering questions about Ethiopian heritage. We will expand this research using external knowledge to gat answer and description beyond image and custom object detection

创建时间：

2024-10-16