EDUBOT: A comprehensive General Science Question Answering Dataset in Bangla Language
收藏doi.org2025-01-21 收录
下载链接:
http://doi.org/10.17632/4bfs4rr3m6.1
下载链接
链接失效反馈官方服务:
资源简介:
Chatbots are increasingly being used for a wide range of services that require interaction in natural language processing. Working with Bangla Natural Language Processing (NLP) is a notable challenge in Bangladesh due to the complexity of the language. To address this, Edubot is created as an educational tool designed to facilitate the development of a question-answering system in the Bangla language, specifically focused on general science topics. This dataset is intended to support the creation of educational tools capable of answering science-related questions in Bangla, covering subjects such as biology, chemistry, physics, and environmental science.
We developed a comprehensive dataset comprising 3,379 questions paired with answers related to general science, all presented in the Bangla language. The data is structured into two columns and is available in CSV format for ease of use.
The dataset includes the following key components:
1. The dataset gathers a variety of resources, including textbooks, articles, and other educational materials.
2. Questions are developed to evaluate comprehension and strengthen understanding of general science concepts.
3. Edubot provides each general science question with a clear, accurate answer, along with the answer's starting position in the dataset for easy reference and verification.
This dataset is highly beneficial for advancing research and development in Bangla NLP, particularly in creating machine learning and artificial intelligence-driven educational chatbots and conversational AI systems.
聊天机器人正日益被应用于广泛需要自然语言交互的服务中。在孟加拉国,由于语言本身的复杂性,与孟加拉语自然语言处理(NLP)的协同工作构成了一项显著的挑战。为此,Edubot应运而生,它是一款旨在促进孟加拉语问答系统开发的教育工具,特别专注于通用科学主题。本数据集旨在支持创建能够用孟加拉语回答科学相关问题的教育工具,涵盖诸如生物学、化学、物理学和环境科学等科目。
我们开发了一个综合性的数据集,包含3,379个与通用科学相关的问题及其答案,所有内容均以孟加拉语呈现。数据结构分为两列,并以CSV格式提供,以便于使用。
数据集包含以下关键要素:
1. 数据集汇集了包括教科书、文章以及其他教育材料在内的多种资源。
2. 问题设计旨在评估对通用科学概念的掌握程度并加强理解。
3. Edubot为每个通用科学问题提供明确、准确的答案,并附带答案在数据集中的起始位置,便于参考和验证。
本数据集对于推进孟加拉语NLP的研究与发展具有重要意义,尤其是在创建由机器学习和人工智能驱动的教育聊天机器人和对话型AI系统方面。
提供机构:
Mendeley Data



