Domain-Specific Stop Word for Brazilian Legal Texts
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/domain-specific-stop-word-brazilian-legal-texts
下载链接
链接失效反馈官方服务:
资源简介:
Standard stop word removal lacks lists tailored for the Brazilian legal domain, where generic lists often prove insufficient, impacting text mining accuracy. This study addresses this by developing and validating a domain-specific stop word list for Brazilian Portuguese legal text. The list was created using statistical frequency analysis of legal corpora, expert validation, and evaluation with the Maria Firmina and Legal Complaint Classifier systems, in collaboration with the Court of Justice of Maranh\u00e3o. This process yielded a specific list of 3,602 terms, substantially larger and distinct from generic lists such as NLTK (203 terms, with minimal overlap). However, evaluations indicated that the domain-specific list offered limited performance advantages over the NLTK list in classification tasks; observed differences in metrics, such as accuracy (e.g., 0.979 for domain-specific vs. 0.973 for NLTK), were minor and often within the margin of error. Despite modest direct performance gains, this study provides a specialized resource for processing Brazilian legal texts. This underscores the nuanced role of domain-specific adaptations in Natural Language Processing for judicial environments, contributing to efforts to reduce complexity and enhance interoperability among textual processing systems.
提供机构:
Jonas C. S. Neto; Thyago M. Rodrigues; Ewaldo E. C. Santana; Antonio F. L. Jacob Jr; Fábio M. F. Lobato; Gustavo S. Silva



