DoctrineAI/legal_document_structuring
收藏Hugging Face2024-03-21 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/DoctrineAI/legal_document_structuring
下载链接
链接失效反馈官方服务:
资源简介:
---
license_name: ludov.1.0
license_link: LICENSE
task_categories:
- text-classification
language:
- fr
tags:
- legal
---
**Task details**
Document structuring plays a crucial role in various natural language processing (NLP) tasks, such as information retrieval, and document understanding.
It also helps readers to effectively navigate into a structured document with a large amount of textual data.
In the legal domain, document structuring is particularly important for creating inter- and intra-document links.
The dataset provides documents segmented into lines.
Each document was collected in HTML format or PDF format.
Then PDFs were converted to HTML with basic formatting tags like bold or italics.
Each line includes layout information, raw text (HTML), and a label indicating whether it's a title.
Common tasks using this data include predicting titles and reconstructing the Table of Contents (TOC) for each document.
While information about the hierarchical structure of each line is not currently available, we plan to incorporate it in future releases.
**Usage**
Using Hugging Face datasets:
```
from datasets import load_dataset
dataset = load_dataset("DoctrineAI/legal_document_structuring")
```
**Source data**
The original data comes from public French institution data :
- https://www.assemblee-nationale.fr/
- https://www.senat.fr/
- https://www-impots-gouv-fr/
**License**
License: [Ludo v.1.0](https://datasets.doctrine.fr/Open%20data%20Use%20Licence.pdf)
提供机构:
DoctrineAI
原始信息汇总
数据集概述
数据集基本信息
- 许可证: Ludo v.1.0
- 任务类别: 文本分类
- 语言: 法语
- 标签: 法律
任务详情
- 数据集涉及文档结构化,这对于信息检索和文档理解等自然语言处理任务至关重要。
- 在法律领域,文档结构化用于创建文档间和文档内的链接。
- 数据集提供分割成行的文档,原始格式为HTML或PDF,PDF已转换为带有基本格式标签的HTML。
- 每行包含布局信息、原始文本(HTML)和一个标签,指示是否为标题。
- 常见的任务包括预测标题和重建每个文档的目录(TOC)。
- 目前不包含每行的层次结构信息,但计划在未来版本中加入。
数据来源
- 数据来源于法国公共机构:
- https://www.assemblee-nationale.fr/
- https://www.senat.fr/
- https://www-impots-gouv-fr/



