five

"Domain-Specific Stop Word for Brazilian Legal Texts"

收藏
DataCite Commons2025-06-02 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/domain-specific-stop-word-brazilian-legal-texts
下载链接
链接失效反馈
官方服务:
资源简介:
"Standard stop word removal lacks lists tailored for the Brazilian legal domain, where generic lists often prove insufficient, impacting text mining accuracy. This study addresses this by developing and validating a domain-specific stop word list for Brazilian Portuguese legal text. The list was created using statistical frequency analysis of legal corpora, expert validation, and evaluation with the Maria Firmina and Legal Complaint Classifier systems, in collaboration with the Court of Justice of Maranh\u00e3o. This process yielded a specific list of 3,602 terms, substantially larger and distinct from generic lists such as NLTK (203 terms, with minimal overlap). However, evaluations indicated that the domain-specific list offered limited performance advantages over the NLTK list in classification tasks; observed differences in metrics, such as accuracy (e.g., 0.979 for domain-specific vs. 0.973 for NLTK), were minor and often within the margin of error. Despite modest direct performance gains, this study provides a specialized resource for processing Brazilian legal texts. This underscores the nuanced role of domain-specific adaptations in Natural Language Processing for judicial environments, contributing to efforts to reduce complexity and enhance interoperability among textual processing systems."
提供机构:
IEEE DataPort
创建时间:
2025-06-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作