Dataset Question Answering for Admission of Higher Education Institution

Name: Dataset Question Answering for Admission of Higher Education Institution
Creator: doi.org
Published: 2023-09-26 00:00:00
License: 暂无描述

doi.org2023-09-26 更新2025-03-24 收录

下载链接：

http://doi.org/10.17632/jc4df8srcb.2

下载链接

链接失效反馈

官方服务：

资源简介：

The data collection process commenced with web scraping of a selected higher education institution's website, collecting any data that relates to the admission topic of higher education institutions, during the period from July to September 2023. This resulted in a raw dataset primarily cantered around admission-related content. Subsequently, meticulous data cleaning and organization procedures were implemented to refine the dataset. The primary data, in its raw form before annotation into a question-and-answer format, was predominantly in the Indonesian language. Following this, a comprehensive annotation process was conducted to enrich the dataset with specific admission-related information, transforming it into secondary data. Both primary and secondary data predominantly remained in the Indonesian language. To enhance data quality, we added filters to remove or exclude: 1) data not in the Indonesian language, 2) data unrelated to the admission topic, and 3) redundant entries. This meticulous curation has culminated in the creation of a finalized dataset, meticulously prepared and now readily available for research and analysis in the domain of higher education admission.

数据收集工作始于对一所选定的高等教育机构网站的爬取，收集了自2023年7月至9月期间与高等教育机构招生主题相关的所有数据。这一过程产生了以招生相关内容为核心的原生数据集。随后，经过严谨的数据清洗和组织流程，对数据集进行了精细化处理。在将原始数据标注为问答格式之前，主要数据以印尼语为主。紧接着，开展了一项全面的标注过程，以丰富数据集，并赋予其特定的招生相关信息，使其转化为次级数据。无论是初级数据还是次级数据，均以印尼语为主。为了提升数据质量，我们添加了筛选机制，以移除或排除以下内容：1）非印尼语数据，2）与招生主题无关的数据，以及3）重复条目。经过如此精心的编纂，最终数据集得以精心准备，现可供高等教育招生领域的研究与分析之用。

提供机构：

doi.org

5,000+

优质数据集

54 个

任务类型

进入经典数据集