LennardZuendorf/openlegaldata-processed

Name: LennardZuendorf/openlegaldata-processed
Creator: LennardZuendorf
Published: 2023-10-07 20:13:13
License: 暂无描述

Hugging Face2023-10-07 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LennardZuendorf/openlegaldata-processed

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit dataset_info: features: - name: id dtype: int64 - name: court struct: - name: id dtype: int64 - name: jurisdiction dtype: string - name: level_of_appeal dtype: string - name: name dtype: string - name: state dtype: int64 - name: file_number dtype: string - name: date dtype: timestamp[s] - name: type dtype: string - name: content dtype: string - name: tenor dtype: string - name: facts dtype: string - name: reasoning dtype: string splits: - name: three num_bytes: 169494251 num_examples: 2828 - name: two num_bytes: 183816899 num_examples: 4954 download_size: 172182482 dataset_size: 353311150 task_categories: - text-classification language: - de tags: - legal pretty_name: Edited German Court case decision size_categories: - 1K<n<10K --- # Dataset Card for openlegaldata.io bulk case data ## Dataset Description This is a edit/cleanup of Bulk Data of [openlegaldata.io](https://de.openlegaldata.io/), which I also brought onto Huggingface [here](LennardZuendorf/openlegaldata-bulk-data). #### The Entire Dataset Is In German - **Github Repository:** [uniArchive-legalis]](https://github.com/LennardZuendorf/uniArchive-legalis) - **Repository:** [Bulk Data](https://static.openlegaldata.io/dumps/de/) ## Edit Summary I have done some cleaning and splitting of the data and filtered out large parts that were not (easily) usable, cutting down the number of cases to at max 4000 - from 250000. This results in two different splits. Which is because German Courts don't format their case decision the same way. ### Data Fields Independent of the split, most fields are the same, they are: | id | court | file_number | date | type | content | - | - | - | - | - | - | | numeric id | name of the court that made the decision | file number of the case ("Aktenzeichen") | decision date | type of the case decision | entire content (text) of the case decision Additionally, I added 3 more fields because of the splitting of the content: #### Two Split - Case Decision I could split into two parts: tenor and reasoning. - Which means the three fields tenor, content and facts contain the following: | tenor | reasoning | facts | - | - | - | | An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | an empty text field | #### Three Split - Case Decision I could split into three parts: tenor, reasoning and facts - This Data I have used to create binary labels with the help of ChatGPT, see [legalis](https://huggingface.co/datasets/LennardZuendorf/legalis) for that - The three fields tenor, content and facts contain the following: | tenor | reasoning | facts | - | - | - | | An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | the facts and details of a case | ### Languages - German ## Additional Information ### Licensing/Citation Information The [openlegaldata platform](https://github.com/openlegaldata/oldp) is licensed under the MIT license, you can access the dataset by citing the original source, [openlegaldata.io](https://de.openlegaldata.io/) and me, [Lennard Zündorf](https://github.com/LennardZuendorf) as the editor of this dataset.

提供机构：

LennardZuendorf

原始信息汇总

数据集卡片 for openlegaldata.io 批量案例数据

数据集描述

数据字段

无论分割如何，大多数字段都是相同的，它们是：

id	court	file_number	date	type	content
数字ID	做出决定的法院名称	案件文件编号（"Aktenzeichen"）	决定日期	案件决定类型	案件决定的全部内容（文本）

此外，由于内容的分割，我添加了3个额外的字段：

两部分分割

案件决定可以分为两部分：tenor 和 reasoning。
这意味着 tenor、content 和 facts 字段包含以下内容：

tenor	reasoning	facts
案件决定的法律摘要	决定的其余部分，详细解释了决定的原因	空文本字段

三部分分割

案件决定可以分为三部分：tenor、reasoning 和 facts。
这三个字段 tenor、content 和 facts 包含以下内容：

tenor	reasoning	facts
案件决定的法律摘要	决定的其余部分，详细解释了决定的原因	案件的事实和细节

语言

德语

附加信息

许可/引用信息

openlegaldata 平台根据 MIT 许可证授权，您可以通过引用原始来源 openlegaldata.io 和我，Lennard Zündorf 作为此数据集的编辑者来访问数据集。

5,000+

优质数据集

54 个

任务类型

进入经典数据集