five

rcds/swiss_law_area_prediction

收藏
Hugging Face2023-07-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rcds/swiss_law_area_prediction
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 annotations_creators: - machine-generated language: - de - fr - it language_creators: - expert-generated multilinguality: - multilingual pretty_name: Law Area Prediction size_categories: - 100K<n<1M source_datasets: - original task_categories: - text-classification --- # Dataset Card for Law Area Prediction ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The dataset contains cases to be classified into the four main areas of law: Public, Civil, Criminal and Social These can be classified further into sub-areas: ``` "public": ['Tax', 'Urban Planning and Environmental', 'Expropriation', 'Public Administration', 'Other Fiscal'], "civil": ['Rental and Lease', 'Employment Contract', 'Bankruptcy', 'Family', 'Competition and Antitrust', 'Intellectual Property'], 'criminal': ['Substantive Criminal', 'Criminal Procedure'] ``` ### Supported Tasks and Leaderboards Law Area Prediction can be used as text classification task ### Languages Switzerland has four official languages with three languages German, French and Italian being represenated. The decisions are written by the judges and clerks in the language of the proceedings. | Language | Subset | Number of Documents| |------------|------------|--------------------| | German | **de** | 127K | | French | **fr** | 156K | | Italian | **it** | 46K | ## Dataset Structure - decision_id: unique identifier for the decision - facts: facts section of the decision - considerations: considerations section of the decision - law_area: label of the decision (main area of law) - law_sub_area: sub area of law of the decision - language: language of the decision - year: year of the decision - court: court of the decision - chamber: chamber of the decision - canton: canton of the decision - region: region of the decision ### Data Fields [More Information Needed] ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits The dataset was split date-stratisfied - Train: 2002-2015 - Validation: 2016-2017 - Test: 2018-2022 ## Dataset Creation ### Curation Rationale ### Source Data #### Initial Data Collection and Normalization The original data are published from the Swiss Federal Supreme Court (https://www.bger.ch) in unprocessed formats (HTML). The documents were downloaded from the Entscheidsuche portal (https://entscheidsuche.ch) in HTML. #### Who are the source language producers? The decisions are written by the judges and clerks in the language of the proceedings. ### Annotations #### Annotation process #### Who are the annotators? ### Personal and Sensitive Information The dataset contains publicly available court decisions from the Swiss Federal Supreme Court. Personal or sensitive information has been anonymized by the court before publication according to the following guidelines: https://www.bger.ch/home/juridiction/anonymisierungsregeln.html. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information We release the data under CC-BY-4.0 which complies with the court licensing (https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf) © Swiss Federal Supreme Court, 2002-2022 The copyright for the editorial content of this website and the consolidated texts, which is owned by the Swiss Federal Supreme Court, is licensed under the Creative Commons Attribution 4.0 International licence. This means that you can re-use the content provided you acknowledge the source and indicate any changes you have made. Source: https://www.bger.ch/files/live/sites/bger/files/pdf/de/urteilsveroeffentlichung_d.pdf ### Citation Information Please cite our [ArXiv-Preprint](https://arxiv.org/abs/2306.09237) ``` @misc{rasiah2023scale, title={SCALE: Scaling up the Complexity for Advanced Language Model Evaluation}, author={Vishvaksenan Rasiah and Ronja Stern and Veton Matoshi and Matthias Stürmer and Ilias Chalkidis and Daniel E. Ho and Joel Niklaus}, year={2023}, eprint={2306.09237}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ### Contributions
提供机构:
rcds
原始信息汇总

数据集概述

数据集描述

数据集摘要

  • 名称: Law Area Prediction
  • 内容: 包含法律案件,需分类至四个主要法律领域:公共法、民法、刑法和社会法。
  • 子领域:
    • 公共法: 税法、城市规划与环境法、征收法、公共行政法、其他财政法
    • 民法: 租赁与租约、雇佣合同、破产、家庭法、竞争与反垄断、知识产权
    • 刑法: 实体刑法、刑事诉讼法

支持的任务和排行榜

  • 任务: 文本分类
  • 应用: 法律领域预测

语言

  • 多语言性: 德语、法语、意大利语
  • 文档数量:
    • 德语: 127K
    • 法语: 156K
    • 意大利语: 46K

数据集结构

数据实例

  • 字段:
    • decision_id: 决策唯一标识
    • facts: 事实部分
    • considerations: 考虑部分
    • law_area: 法律领域标签
    • law_sub_area: 法律子领域
    • language: 决策语言
    • year: 决策年份
    • court: 法院
    • chamber: 法庭
    • canton: 州
    • region: 地区

数据分割

  • 训练集: 2002-2015
  • 验证集: 2016-2017
  • 测试集: 2018-2022

数据集创建

源数据

  • 来源: 瑞士联邦最高法院
  • 格式: HTML
  • 语言生产者: 法官和书记员

注释

  • 创建者: 机器生成
  • 敏感信息处理: 法院在发布前已根据特定指南进行匿名化处理

许可证信息

  • 许可证: CC-BY-SA-4.0
  • 版权: 瑞士联邦最高法院,2002-2022

引用信息

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作