A data-oriented ontology generation using Thai semi-structured data
收藏DataCite Commons2026-02-03 更新2026-05-04 收录
下载链接:
http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2025.109
下载链接
链接失效反馈官方服务:
资源简介:
An ontology is a commonly used knowledge base for representation of domain knowledge. Developing a knowledge representing ontology is a tedious and difficult task since it requires expertise in both domain knowledge and knowledge engineering, but the developed ontologies are useful to provide ground-truth for an intelligent system to comprehend real-world knowledge in terms of schematized network of concepts in a domain and instances within the given schema. This work thus aims to assist an ontology development and instance extraction from Thai semi-structured data. The proposed method focuses on finding the pattern of collocated Thai text and generate a collocation tree of words in sequence. The patterns then are analysed for sharing words and unique words. The sharing word sequences are considered as templates while the unique words which are possibly varied from instances are kept as variables. For forming into an ontology, the obtained templates are transformed into ontological properties indicating attributive relations of concepts, and the variables are considered for the value of each relation to the given individuals. With the properties, triple forms of concept relations are converted into an RDF/OWL for ontology standard format usable in applications for a task of querying and reasoning. For testing, the medicine domain data about drug instruction are applied to generate the ontology and extract the instances. From the experimental results, the generated ontology and its instances obtained the acceptable evaluations from domain experts. The ontology as knowledge base for querying shows that it can provide high precision and recall, but some results show incorrect results due to ambiguity from the source data. Upon error analysis, it is recognised that text data play the crucial role for ontology quality as the natural text data cause the misleading and the data may not provide necessary domain knowledge for representing full knowledge in a domain. Last, the extracted instances from the method are well-received for reducing the burden of matching useful property details among text to the given schema.
提供机构:
Thammasat University
创建时间:
2026-02-03



