rcds/swiss_criticality_prediction
收藏Hugging Face2023-07-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rcds/swiss_criticality_prediction
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- machine-generated
language:
- de
- fr
- it
language_creators:
- expert-generated
license:
- cc-by-sa-4.0
multilinguality:
- multilingual
pretty_name: Legal Criticality Prediction
size_categories:
- 100K<n<1M
source_datasets:
- original
tags: []
task_categories:
- text-classification
---
# Dataset Card for Criticality Prediction
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
Legal Criticality Prediction (LCP) is a multilingual, diachronic dataset of 139K Swiss Federal Supreme Court (FSCS) cases annotated with two criticality labels. The bge_label i a binary label (critical, non-critical), while the citation label has 5 classes (critical-1, critical-2, critical-3, critical-4, non-critical). Critical classes of the citation_label are distinct subsets of the critical class of the bge_label. This dataset creates a challenging text classification task. We also provide additional metadata as the publication year, the law area and the canton of origin per case, to promote robustness and fairness studies on the critical area of legal NLP.
### Supported Tasks and Leaderboards
LCP can be used as text classification task
### Languages
Switzerland has four official languages with three languages German, French and Italian being represenated. The decisions are written by the judges and clerks in the language of the proceedings.
German (91k), French (33k), Italian (15k)
## Dataset Structure
```
{
"decision_id": "008d8a52-f0ea-4820-a18c-d06066dbb407",
"language": "fr",
"year": "2018",
"chamber": "CH_BGer_004",
"region": "Federation",
"origin_chamber": "338.0",
"origin_court": "127.0",
"origin_canton": "24.0",
"law_area": "civil_law",
"law_sub_area": ,
"bge_label": "critical",
"citation_label": "critical-1",
"facts": "Faits : A. A.a. Le 17 août 2007, C.X._, née le 14 février 1944 et domiciliée...",
"considerations": "Considérant en droit : 1. Interjeté en temps utile (art. 100 al. 1 LTF) par les défendeurs qui ont succombé dans leurs conclusions (art. 76 LTF) contre une décision...",
"rulings": "Par ces motifs, le Tribunal fédéral prononce : 1. Le recours est rejeté. 2. Les frais judiciaires, arrêtés à 10'000 fr., sont mis solidairement à la charge des recourants...",
}
```
### Data Fields
```
decision_id: (str) a unique identifier of the for the document
language: (str) one of (de, fr, it)
year: (int) the publication year
chamber: (str) the chamber of the case
region: (str) the region of the case
origin_chamber: (str) the chamber of the origin case
origin_court: (str) the court of the origin case
origin_canton: (str) the canton of the origin case
law_area: (str) the law area of the case
law_sub_area:(str) the law sub area of the case
bge_label: (str) critical or non-critical
citation_label: (str) critical-1, critical-2, critical-3, critical-4, non-critical
facts: (str) the facts of the case
considerations: (str) the considerations of the case
rulings: (str) the rulings of the case
```
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
The dataset was split date-stratisfied
- Train: 2002-2015
- Validation: 2016-2017
- Test: 2018-2022
| Language | Subset | Number of Documents (Training/Validation/Test) |
|------------|------------|--------------------------------------------|
| German | **de** | 81'264 (56592 / 19601 / 5071) |
| French | **fr** | 49'354 (29263 / 11117 / 8974) |
| Italian | **it** | 7913 (5220 / 1901 / 792) |
## Dataset Creation
### Curation Rationale
The dataset was created by Stern (2023).
### Source Data
#### Initial Data Collection and Normalization
The original data are published from the Swiss Federal Supreme Court (https://www.bger.ch) in unprocessed formats (HTML). The documents were downloaded from the Entscheidsuche portal (https://entscheidsuche.ch) in HTML.
#### Who are the source language producers?
The decisions are written by the judges and clerks in the language of the proceedings.
### Annotations
#### Annotation process
bge_label:
1. all bger_references in the bge header were extracted (for bge see rcds/swiss_rulings).
2. bger file_names are compared with the found references
citation_label:
1. count all citations for all bger cases and weight citations
2. divide cited cases in four different classes, depending on amount of citations
#### Who are the annotators?
Stern processed data and introduced bge and citation-label
Metadata is published by the Swiss Federal Supreme Court (https://www.bger.ch).
### Personal and Sensitive Information
The dataset contains publicly available court decisions from the Swiss Federal Supreme Court. Personal or sensitive information has been anonymized by the court before publication according to the following guidelines: https://www.bger.ch/home/juridiction/anonymisierungsregeln.html.
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
We release the data under CC-BY-4.0 which complies with the court licensing (https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf)
© Swiss Federal Supreme Court, 2002-2022
The copyright for the editorial content of this website and the consolidated texts, which is owned by the Swiss Federal Supreme Court, is licensed under the Creative Commons Attribution 4.0 International licence. This means that you can re-use the content provided you acknowledge the source and indicate any changes you have made.
Source: https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf
### Citation Information
Please cite our [ArXiv-Preprint](https://arxiv.org/abs/2306.09237)
```
@misc{rasiah2023scale,
title={SCALE: Scaling up the Complexity for Advanced Language Model Evaluation},
author={Vishvaksenan Rasiah and Ronja Stern and Veton Matoshi and Matthias Stürmer and Ilias Chalkidis and Daniel E. Ho and Joel Niklaus},
year={2023},
eprint={2306.09237},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Contributions
Thanks to [@Stern5497](https://github.com/stern5497) for adding this dataset.
提供机构:
rcds
原始信息汇总
数据集概述
数据集名称
- 名称:Legal Criticality Prediction (LCP)
- 别名:Criticality Prediction
数据集属性
- 语言:德语、法语、意大利语
- 许可证:CC-BY-SA-4.0
- 多语言性:多语言
- 大小:100K<n<1M
- 源数据:原始数据
- 任务类别:文本分类
数据集描述
- 概述:LCP是一个包含139K瑞士联邦最高法院案例的多语言、历时数据集,标注有两个关键性标签。bge_label为二元标签(关键、非关键),citation_label有5个类别(关键-1, 关键-2, 关键-3, 关键-4, 非关键)。该数据集为文本分类任务提供了挑战。
- 语言分布:德语(91k)、法语(33k)、意大利语(15k)
数据集结构
- 数据实例:每个实例包含案件ID、语言、年份、法庭、地区、原始法庭、法律领域等字段。
- 数据字段:包括decision_id, language, year, chamber等。
- 数据分割:训练集(2002-2015)、验证集(2016-2017)、测试集(2018-2022)。
数据集创建
- 来源数据:数据来源于瑞士联邦最高法院,原始格式为HTML。
- 标注过程:bge_label和citation_label的标注过程描述。
- 个人敏感信息:数据中的个人信息已由法院在发布前进行匿名处理。
使用考虑
- 许可证信息:数据集根据CC-BY-4.0许可证发布,与法院的许可证一致。
- 引用信息:引用时请参考ArXiv预印本。
数据集详细信息
数据集结构
- 数据实例:每个实例详细描述了案件的各个方面,包括事实、考虑和裁决。
- 数据字段:详细列出了每个字段的类型和含义。
- 数据分割:详细描述了不同语言和不同数据集分割的文档数量。
数据集创建
- 来源数据:详细描述了数据的原始来源和格式。
- 标注过程:详细描述了bge_label和citation_label的标注方法。
- 个人敏感信息:详细描述了个人信息的处理方式。
使用考虑
- 许可证信息:详细描述了许可证的条款和条件。
- 引用信息:提供了详细的引用格式。
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



