five

TCU Votes, STJ Judgments, TCU Votes for Textual Semantic Similarity, STJ Judgments for Textual Semantic Similarity

收藏
arXiv2023-05-30 更新2024-06-21 收录
下载链接:
https://osf.io/k2qpx/
下载链接
链接失效反馈
官方服务:
资源简介:
本研究贡献了四个葡萄牙语法律领域的数据集,专注于语义文本相似性以支持类似文档检索。其中两个数据集(TCU Votes和STJ Judgments)包含文档及其元数据但未标注,另外两个(TCU Votes for Textual Semantic Similarity和STJ Judgments for Textual Semantic Similarity)则使用本文提出的启发式方法进行标注,用于语义文本相似性任务。此外,还提供了一个由领域专家标注的小型真实数据集,用于评估启发式标注过程的有效性。这些数据集的创建旨在解决法律领域中由于大量文本处理需求而面临的自动化挑战,特别是在处理大量法律程序文本时。

This study contributes four Portuguese legal-domain datasets focused on semantic textual similarity to support similar document retrieval. Two of the datasets, TCU Votes and STJ Judgments, contain documents and their metadata without annotation, while the other two, TCU Votes for Textual Semantic Similarity and STJ Judgments for Textual Semantic Similarity, are annotated using the heuristic method proposed in this paper for the semantic textual similarity task. In addition, a small real-world dataset annotated by domain experts is provided to evaluate the effectiveness of the heuristic annotation process. These datasets are developed to address the automation challenges faced in the legal domain due to the substantial demand for text processing, particularly when dealing with large volumes of legal procedural texts.
提供机构:
弗卢米嫩塞联邦大学计算机研究所
创建时间:
2023-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作