MartimZanatti/Segmentation_judgments_STJ

Name: MartimZanatti/Segmentation_judgments_STJ
Creator: MartimZanatti
Published: 2024-07-10 11:20:41
License: 暂无描述

Hugging Face2024-07-10 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/MartimZanatti/Segmentation_judgments_STJ

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集旨在训练一个分割模型，能够将葡萄牙最高法院的判决书段落划分为不同的部分。数据集包含JSON文件，其中一个是判决文本，另一个是标注信息。标注信息提供了每个部分的起始和结束段落ID以及部分类型。可能的10个部分包括：头部分、报告部分、界定部分、事实部分、法律部分、决定部分、签名部分、声明部分、脚注和标题部分。数据集还包含原始数据集及其变体，变体是通过删除不存在的部分生成的。

The dataset aims to train a segmentation model that can divide the judgment text of the Portuguese Supreme Court into different sections by paragraphs. It includes JSON files where the judgment text is divided into paragraphs, each with a unique ID. The annotation information is a list of dictionaries, each providing the start and end paragraph IDs and the section name. The dataset also includes 10 possible sections such as head, report, facts, etc. Additionally, the dataset includes variations of the original dataset, which have removed certain sections that do not always appear in the judgments.

提供机构：

MartimZanatti

原始信息汇总

葡萄牙最高法院判决分段数据集

数据集目标

该数据集旨在训练一个分段模型，能够将葡萄牙最高法院（STJ）的判决文本按段落划分为不同的判决部分。

数据集内容

JSON文件

Judgment Text（判决文本）
- 包含判决文本，按段落划分，每个段落关联一个唯一的ID。
Denotations（标注）
- 包含一个字典列表，每个字典提供部分信息：
  - id（部分ID）
  - start（开始段落ID）
  - end（结束段落ID）
  - type（部分名称）

可能的部分

判决中可能包含以下10个部分：

head（头部）
report（报告）
delimitation（界定）
facts（事实依据）
law（法律依据）
decision（决定）
signature（签名）
declaration（声明）
foot-notes（脚注）
titles（标题）

变体

数据集中包含原始数据集，但某些部分在判决中并不总是出现。为了解决这个问题，通过删除这些缺失的部分，创建了原始判决的变体。

5,000+

优质数据集

54 个

任务类型

进入经典数据集