LennardZuendorf/openlegaldata-processed
收藏Hugging Face2023-10-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LennardZuendorf/openlegaldata-processed
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
dataset_info:
features:
- name: id
dtype: int64
- name: court
struct:
- name: id
dtype: int64
- name: jurisdiction
dtype: string
- name: level_of_appeal
dtype: string
- name: name
dtype: string
- name: state
dtype: int64
- name: file_number
dtype: string
- name: date
dtype: timestamp[s]
- name: type
dtype: string
- name: content
dtype: string
- name: tenor
dtype: string
- name: facts
dtype: string
- name: reasoning
dtype: string
splits:
- name: three
num_bytes: 169494251
num_examples: 2828
- name: two
num_bytes: 183816899
num_examples: 4954
download_size: 172182482
dataset_size: 353311150
task_categories:
- text-classification
language:
- de
tags:
- legal
pretty_name: Edited German Court case decision
size_categories:
- 1K<n<10K
---
# Dataset Card for openlegaldata.io bulk case data
## Dataset Description
This is a edit/cleanup of Bulk Data of [openlegaldata.io](https://de.openlegaldata.io/), which I also brought onto Huggingface [here](LennardZuendorf/openlegaldata-bulk-data).
#### The Entire Dataset Is In German
- **Github Repository:** [uniArchive-legalis]](https://github.com/LennardZuendorf/uniArchive-legalis)
- **Repository:** [Bulk Data](https://static.openlegaldata.io/dumps/de/)
## Edit Summary
I have done some cleaning and splitting of the data and filtered out large parts that were not (easily) usable, cutting down the number of cases to at max 4000 - from 250000. This results in two different splits. Which is because German Courts don't format their case decision the same way.
### Data Fields
Independent of the split, most fields are the same, they are:
| id | court | file_number | date | type | content
| - | - | - | - | - | - |
| numeric id | name of the court that made the decision | file number of the case ("Aktenzeichen") | decision date | type of the case decision | entire content (text) of the case decision
Additionally, I added 3 more fields because of the splitting of the content:
#### Two Split
- Case Decision I could split into two parts: tenor and reasoning.
- Which means the three fields tenor, content and facts contain the following:
| tenor | reasoning | facts
| - | - | - |
| An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | an empty text field |
#### Three Split
- Case Decision I could split into three parts: tenor, reasoning and facts
- This Data I have used to create binary labels with the help of ChatGPT, see [legalis](https://huggingface.co/datasets/LennardZuendorf/legalis) for that
- The three fields tenor, content and facts contain the following:
| tenor | reasoning | facts
| - | - | - |
| An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | the facts and details of a case |
### Languages
- German
## Additional Information
### Licensing/Citation Information
The [openlegaldata platform](https://github.com/openlegaldata/oldp) is licensed under the MIT license, you can access the dataset by citing the original source, [openlegaldata.io](https://de.openlegaldata.io/) and me, [Lennard Zündorf](https://github.com/LennardZuendorf) as the editor of this dataset.
提供机构:
LennardZuendorf
原始信息汇总
数据集卡片 for openlegaldata.io 批量案例数据
数据集描述
数据字段
无论分割如何,大多数字段都是相同的,它们是:
| id | court | file_number | date | type | content |
|---|---|---|---|---|---|
| 数字ID | 做出决定的法院名称 | 案件文件编号("Aktenzeichen") | 决定日期 | 案件决定类型 | 案件决定的全部内容(文本) |
此外,由于内容的分割,我添加了3个额外的字段:
两部分分割
- 案件决定可以分为两部分:tenor 和 reasoning。
- 这意味着 tenor、content 和 facts 字段包含以下内容:
| tenor | reasoning | facts |
|---|---|---|
| 案件决定的法律摘要 | 决定的其余部分,详细解释了决定的原因 | 空文本字段 |
三部分分割
- 案件决定可以分为三部分:tenor、reasoning 和 facts。
- 这三个字段 tenor、content 和 facts 包含以下内容:
| tenor | reasoning | facts |
|---|---|---|
| 案件决定的法律摘要 | 决定的其余部分,详细解释了决定的原因 | 案件的事实和细节 |
语言
- 德语
附加信息
许可/引用信息
openlegaldata 平台 根据 MIT 许可证授权,您可以通过引用原始来源 openlegaldata.io 和我,Lennard Zündorf 作为此数据集的编辑者来访问数据集。



