Multilabel Thai property-related offences
收藏DataCite Commons2025-02-10 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/multilabel-thai-property-related-offences
下载链接
链接失效反馈官方服务:
资源简介:
Legal analysis utilizing natural language processing and machine learning technologies is a difficult undertaking that has recently sparked the interest of many academics and industries. Using a human-annotated dataset summarized into colloquial Thai from Supreme Court decisions, this work investigates a different combination of NLP, ML, and rule-based techniques for accurate legal case analysis as per Thai law, especially property-related offences, with the intuition to imitate the lawyer's cognitive process. We experimented with two major tasks, binary and multi-label classification, evaluated using a five-fold cross-validation method. We achieved exceptional performance for the former task for average accuracy and F1-score, reaching 94.2\% and 96.7\%, respectively, together with an intriguing finding that solely vanilla fastText, a static embedding, is enough for such a task. For the part of multi-label classification, we obtained a remarkable result of 82\% in average zero-one accuracy and 92\% in average hamming accuracy, with the fine-tuned joint embedding classification pipeline incorporating rule-based post-processing, showing an improvement from without the rule-based technique. This highlights the possibility of integrating the symbolic information from a rule-based algorithm together with the statistical computation from machine learning techniques in performing a complex legal analysis task.
提供机构:
IEEE DataPort
创建时间:
2025-02-10



