itacasehold/itacasehold

Name: itacasehold/itacasehold
Creator: itacasehold
Published: 2024-01-19 13:53:53
License: 暂无描述

Hugging Face2024-01-19 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/itacasehold/itacasehold

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 dataset_info: features: - name: url dtype: string - name: title dtype: string - name: doc dtype: string - name: summary dtype: string - name: materia dtype: string splits: - name: train num_bytes: 25541563 num_examples: 792 - name: validation num_bytes: 2932410 num_examples: 88 - name: test num_bytes: 6870636 num_examples: 221 download_size: 18051772 dataset_size: 35344609 task_categories: - summarization - text-classification language: - it tags: - legal pretty_name: ita_casehold size_categories: - n<1K --- # ITA-CASEHOLD ## Dataset Summary - This dataset contains the data used in the research of the ITA-CASEHOLD model, an extractive summarization model to extract holdings from Italian Legal Administrative documents. - The research paper titled 'Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization' is accepted for ICAIL 23. - It consists of 1101 pairs of judgments and their official holdings between the years 2019 and 2022 from the archives of [Italian Administrative Justice](https://www.giustizia-amministrativa.it/it/web/guest/massime). - The Administrative Justice system in Italy covers a wide range of issues, including public contracts, environmental protection, public services, immigration, taxes, and compensation for damages caused by the State ### Download the dataset To download the dataset, use the following lines: from datasets import load_dataset dataset = load_dataset("itacasehold/itacasehold") To split the train, test, and validation dataset, use dataset = load_dataset("itacasehold/itacasehold", split = 'train') ### Supported Tasks and Leaderboards Summarization, Multi-class Text classification ### Languages Italian ### Data Fields The dataset consists of - **URL**: link to the document - **Document**: The document - **Summary**: The holding of the document - **Materia** : Legal subject - **Title** : Title of the document ### Data Splits - **Train** : 792 - **Validatio** : 88 - **Test** : 221 ### Source Data The data is collected from ['Judicial Administration site'](https://www.giustizia-amministrativa.it/it/web/guest/massime). ### Social Impact of Dataset Legal holdings are considered the most essential part of a legal decision because they summarize it without going into the merits of the specific case, establish a legal principle and set a legal precedent. The holdings writing is carried out by legal experts who, starting from a judgment, set out the applied principle of law in a clear, precise, and concise manner. We approached the problem of extracting legal holdings as an Extractive text summarization task. This Dataset addresses the Legal holding Extraction topic and so far the first and the only one present in the Italian language. This dataset contributes to Summarization in the Italian language and Summarization tasks in Legal domains. Apart from this, the Dataset can also be used as a multi-class text classification task utilizing legal subjects. ### Dataset Limitation This Dataset specifically focuses on the Italian Legal domain, and it is only in Italian. The documents are only from the period of 2019-2022. ## Additional Information ### Dataset Curators The Dataset was curated by researchers from Scoula Superiore Sant'Anna as a part of the project ['Guistizia Agile (Agile Justice)'](https://www.unitus.it/it/unitus/mappatura-della-ricerca/articolo/giustizia-agile) funded by the Italian Ministry of Justice. ### Licensing Information The data sets are distributed under the `Apache 2.0` License. More information about the terms of use of the original data sets is listed [here](https://www.apache.org/licenses/LICENSE-2.0). ### Citation Information If you use this dataset then, please, cite the following paper: @inproceedings{10.1145/3594536.3595177, author = {Licari, Daniele and Bushipaka, Praveen and Marino, Gabriele and Comand\'{e}, Giovanni and Cucinotta, Tommaso}, title = {Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization}, year = {2023}, isbn = {9798400701979}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3594536.3595177}, doi = {10.1145/3594536.3595177}, abstract = {Legal holdings are used in Italy as a critical component of the legal system, serving to establish legal precedents, provide guidance for future legal decisions, and ensure consistency and predictability in the interpretation and application of the law. They are written by domain experts who describe in a clear and concise manner the principle of law applied in the judgments.We introduce a legal holding extraction method based on Italian-LEGAL-BERT to automatically extract legal holdings from Italian cases. In addition, we present ITA-CaseHold, a benchmark dataset for Italian legal summarization. We conducted several experiments using this dataset, as a valuable baseline for future research on this topic.}, booktitle = {Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law}, pages = {148–156}, numpages = {9}, keywords = {Italian-LEGAL-BERT, Holding Extraction, Extractive Text Summarization, Benchmark Dataset}, location = {<conf-loc>, <city>Braga</city>, <country>Portugal</country>, </conf-loc>}, series = {ICAIL '23} }

提供机构：

itacasehold

原始信息汇总

ITA-CASEHOLD 数据集概述

数据集摘要

该数据集用于 ITA-CASEHOLD 模型研究，该模型是一种从意大利法律行政文件中提取判决要点的抽取式摘要模型。
该数据集包含 2019 年至 2022 年间从意大利行政司法档案中收集的 1101 对判决及其官方判决要点。
意大利行政司法系统涵盖广泛的问题，包括公共合同、环境保护、公共服务、移民、税收以及国家造成的损害赔偿。

数据字段

URL: 文档链接
Document: 文档内容
Summary: 文档的判决要点
Materia: 法律主题
Title: 文档标题

数据分割

Train: 792 条数据
Validation: 88 条数据
Test: 221 条数据

支持的任务和排行榜

摘要
多类文本分类

语言

意大利语

数据集来源

数据收集自 Judicial Administration site。

数据集的社会影响

法律判决要点是法律决策中最关键的部分，因为它们在不涉及具体案件细节的情况下总结了判决，确立了法律原则并设定了法律先例。
该数据集解决了意大利语中的法律判决要点提取问题，是目前意大利语中唯一的数据集。
该数据集还可用作多类文本分类任务，利用法律主题。

数据集限制

该数据集专门关注意大利法律领域，并且仅使用意大利语。
文档仅来自 2019-2022 年期间。

附加信息

数据集策划者

该数据集由 Scoula Superiore SantAnna 的研究人员策划，作为 Guistizia Agile (Agile Justice) 项目的一部分，该项目由意大利司法部资助。

许可信息

数据集在 Apache 2.0 许可下发布。

引用信息

如果使用该数据集，请引用以下论文：

plaintext @inproceedings{10.1145/3594536.3595177, author = {Licari, Daniele and Bushipaka, Praveen and Marino, Gabriele and Comand{e}, Giovanni and Cucinotta, Tommaso}, title = {Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization}, year = {2023}, isbn = {9798400701979}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3594536.3595177}, doi = {10.1145/3594536.3595177}, abstract = {Legal holdings are used in Italy as a critical component of the legal system, serving to establish legal precedents, provide guidance for future legal decisions, and ensure consistency and predictability in the interpretation and application of the law. They are written by domain experts who describe in a clear and concise manner the principle of law applied in the judgments.We introduce a legal holding extraction method based on Italian-LEGAL-BERT to automatically extract legal holdings from Italian cases. In addition, we present ITA-CaseHold, a benchmark dataset for Italian legal summarization. We conducted several experiments using this dataset, as a valuable baseline for future research on this topic.}, booktitle = {Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law}, pages = {148–156}, numpages = {9}, keywords = {Italian-LEGAL-BERT, Holding Extraction, Extractive Text Summarization, Benchmark Dataset}, location = {<conf-loc>, <city>Braga</city>, <country>Portugal</country>, </conf-loc>}, series = {ICAIL 23} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集