five

LennardZuendorf/openlegaldata-processed

收藏
Hugging Face2023-10-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LennardZuendorf/openlegaldata-processed
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit dataset_info: features: - name: id dtype: int64 - name: court struct: - name: id dtype: int64 - name: jurisdiction dtype: string - name: level_of_appeal dtype: string - name: name dtype: string - name: state dtype: int64 - name: file_number dtype: string - name: date dtype: timestamp[s] - name: type dtype: string - name: content dtype: string - name: tenor dtype: string - name: facts dtype: string - name: reasoning dtype: string splits: - name: three num_bytes: 169494251 num_examples: 2828 - name: two num_bytes: 183816899 num_examples: 4954 download_size: 172182482 dataset_size: 353311150 task_categories: - text-classification language: - de tags: - legal pretty_name: Edited German Court case decision size_categories: - 1K<n<10K --- # Dataset Card for openlegaldata.io bulk case data ## Dataset Description This is a edit/cleanup of Bulk Data of [openlegaldata.io](https://de.openlegaldata.io/), which I also brought onto Huggingface [here](LennardZuendorf/openlegaldata-bulk-data). #### The Entire Dataset Is In German - **Github Repository:** [uniArchive-legalis]](https://github.com/LennardZuendorf/uniArchive-legalis) - **Repository:** [Bulk Data](https://static.openlegaldata.io/dumps/de/) ## Edit Summary I have done some cleaning and splitting of the data and filtered out large parts that were not (easily) usable, cutting down the number of cases to at max 4000 - from 250000. This results in two different splits. Which is because German Courts don't format their case decision the same way. ### Data Fields Independent of the split, most fields are the same, they are: | id | court | file_number | date | type | content | - | - | - | - | - | - | | numeric id | name of the court that made the decision | file number of the case ("Aktenzeichen") | decision date | type of the case decision | entire content (text) of the case decision Additionally, I added 3 more fields because of the splitting of the content: #### Two Split - Case Decision I could split into two parts: tenor and reasoning. - Which means the three fields tenor, content and facts contain the following: | tenor | reasoning | facts | - | - | - | | An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | an empty text field | #### Three Split - Case Decision I could split into three parts: tenor, reasoning and facts - This Data I have used to create binary labels with the help of ChatGPT, see [legalis](https://huggingface.co/datasets/LennardZuendorf/legalis) for that - The three fields tenor, content and facts contain the following: | tenor | reasoning | facts | - | - | - | | An abstract, legal summary of the cases decision | the entire rest of the decision, explaining in detail why the decision has been made | the facts and details of a case | ### Languages - German ## Additional Information ### Licensing/Citation Information The [openlegaldata platform](https://github.com/openlegaldata/oldp) is licensed under the MIT license, you can access the dataset by citing the original source, [openlegaldata.io](https://de.openlegaldata.io/) and me, [Lennard Zündorf](https://github.com/LennardZuendorf) as the editor of this dataset.
提供机构:
LennardZuendorf
原始信息汇总

数据集卡片 for openlegaldata.io 批量案例数据

数据集描述

数据字段

无论分割如何,大多数字段都是相同的,它们是:

id court file_number date type content
数字ID 做出决定的法院名称 案件文件编号("Aktenzeichen") 决定日期 案件决定类型 案件决定的全部内容(文本)

此外,由于内容的分割,我添加了3个额外的字段:

两部分分割

  • 案件决定可以分为两部分:tenor 和 reasoning。
  • 这意味着 tenor、content 和 facts 字段包含以下内容:
tenor reasoning facts
案件决定的法律摘要 决定的其余部分,详细解释了决定的原因 空文本字段

三部分分割

  • 案件决定可以分为三部分:tenor、reasoning 和 facts。
  • 这三个字段 tenor、content 和 facts 包含以下内容:
tenor reasoning facts
案件决定的法律摘要 决定的其余部分,详细解释了决定的原因 案件的事实和细节

语言

  • 德语

附加信息

许可/引用信息

openlegaldata 平台 根据 MIT 许可证授权,您可以通过引用原始来源 openlegaldata.io 和我,Lennard Zündorf 作为此数据集的编辑者来访问数据集。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作