five

Multilabel Thai property-related offences

收藏
DataCite Commons2025-02-10 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/multilabel-thai-property-related-offences
下载链接
链接失效反馈
官方服务:
资源简介:
Legal analysis utilizing natural language processing and machine learning technologies is a difficult undertaking that has recently sparked the interest of many academics and industries. Using a human-annotated dataset summarized into colloquial Thai from Supreme Court decisions, this work investigates a different combination of NLP, ML, and rule-based techniques for accurate legal case analysis as per Thai law, especially property-related offences, with the intuition to imitate the lawyer's cognitive process. We experimented with two major tasks, binary and multi-label classification, evaluated using a five-fold cross-validation method. We achieved exceptional performance for the former task for average accuracy and F1-score, reaching 94.2\% and 96.7\%, respectively, together with an intriguing finding that solely vanilla fastText, a static embedding, is enough for such a task. For the part of multi-label classification, we obtained a remarkable result of 82\% in average zero-one accuracy and 92\% in average hamming accuracy, with the fine-tuned joint embedding classification pipeline incorporating rule-based post-processing, showing an improvement from without the rule-based technique. This highlights the possibility of integrating the symbolic information from a rule-based algorithm together with the statistical computation from machine learning techniques in performing a complex legal analysis task.
提供机构:
IEEE DataPort
创建时间:
2025-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作