Dataset Question Answering for Admission of Higher Education Institution
收藏doi.org2023-09-26 更新2025-03-24 收录
下载链接:
http://doi.org/10.17632/jc4df8srcb.2
下载链接
链接失效反馈官方服务:
资源简介:
The data collection process commenced with web scraping of a selected higher education institution's website, collecting any data that relates to the admission topic of higher education institutions, during the period from July to September 2023. This resulted in a raw dataset primarily cantered around admission-related content. Subsequently, meticulous data cleaning and organization procedures were implemented to refine the dataset. The primary data, in its raw form before annotation into a question-and-answer format, was predominantly in the Indonesian language. Following this, a comprehensive annotation process was conducted to enrich the dataset with specific admission-related information, transforming it into secondary data. Both primary and secondary data predominantly remained in the Indonesian language. To enhance data quality, we added filters to remove or exclude: 1) data not in the Indonesian language, 2) data unrelated to the admission topic, and 3) redundant entries. This meticulous curation has culminated in the creation of a finalized dataset, meticulously prepared and now readily available for research and analysis in the domain of higher education admission.
数据收集工作始于对一所选定的高等教育机构网站的爬取,收集了自2023年7月至9月期间与高等教育机构招生主题相关的所有数据。这一过程产生了以招生相关内容为核心的原生数据集。随后,经过严谨的数据清洗和组织流程,对数据集进行了精细化处理。在将原始数据标注为问答格式之前,主要数据以印尼语为主。紧接着,开展了一项全面的标注过程,以丰富数据集,并赋予其特定的招生相关信息,使其转化为次级数据。无论是初级数据还是次级数据,均以印尼语为主。为了提升数据质量,我们添加了筛选机制,以移除或排除以下内容:1)非印尼语数据,2)与招生主题无关的数据,以及3)重复条目。经过如此精心的编纂,最终数据集得以精心准备,现可供高等教育招生领域的研究与分析之用。
提供机构:
doi.org



