five

MultiLegalSBD

收藏
arXiv2023-05-02 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/rcds/MultiLegalSBD
下载链接
链接失效反馈
官方服务:
资源简介:
MultiLegalSBD是一个包含超过130,000个标注句子的多语言法律句子边界检测数据集,涵盖法语、意大利语、西班牙语、英语和德语。该数据集由伯尔尼大学创建,旨在解决法律领域中复杂的句子结构问题。数据集内容包括各种法律领域的判决和法律条文,数据来源于多个国家和地区的法律文件。创建过程中,使用了基于CRF的自动句子边界检测系统进行初步标注,并通过人工校正提高数据质量。该数据集主要应用于法律文本的自然语言处理任务,如句子边界检测、文本摘要和实体识别,以提高法律文本处理的准确性和效率。

MultiLegalSBD is a multilingual legal sentence boundary detection dataset containing over 130,000 annotated sentences, covering French, Italian, Spanish, English and German. Developed by the University of Bern, this dataset aims to address the complex sentence structure challenges in the legal domain. It includes judgments and legal provisions across diverse legal fields, sourced from legal documents of multiple countries and regions. During the dataset construction, a CRF-based automatic sentence boundary detection system was adopted for preliminary annotation, followed by manual correction to improve data quality. This dataset is primarily utilized for natural language processing tasks on legal texts, such as sentence boundary detection, text summarization and entity recognition, to enhance the accuracy and efficiency of legal text processing.
提供机构:
伯尔尼大学
创建时间:
2023-05-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作