dossier-legal/italian-legal-corpus
收藏Hugging Face2026-03-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dossier-legal/italian-legal-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- it
license: cc-by-4.0
task_categories:
- text-generation
- text-classification
tags:
- legal
- italian
- legislation
- court-decisions
- eu-law
pretty_name: Italian Legal Corpus
size_categories:
- 100K<n<1M
---
# Italian Legal Corpus
A comprehensive corpus of Italian legal texts from 4 open-data sources,
designed for training and evaluating legal NLP models.
## Sources
| Source | Description | Documents |
|--------|-------------|-----------|
| **Normattiva** | All Italian national legislation (1861-2026) | ~300K |
| **Corte Costituzionale** | Constitutional Court decisions (1956-2026) | ~18K |
| **OpenGA** | Administrative justice metadata | ~100K |
| **EUR-Lex** | EU legislation in Italian | ~50K |
## Schema
Each record contains:
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Globally unique ID (`{{source}}_{{specific_id}}`) |
| `source` | string | One of: normattiva, corte_costituzionale, openga, eurlex |
| `doc_type` | string | Document type (legislation, decision, regulation, etc.) |
| `title` | string | Human-readable title |
| `date` | string? | ISO 8601 date (YYYY-MM-DD) |
| `text` | string | Full cleaned text |
| `authority` | string? | Issuing authority |
| `number` | string? | Document number |
| `year` | int? | Publication year |
| `ecli` | string? | ECLI identifier (court decisions) |
| `text_length` | int | Character count of text field |
| `language` | string | Always "it" |
## License
The underlying legal texts are public domain (Italian law, EU law, court decisions).
This dataset compilation is released under CC-BY-4.0.
## Citation
```bibtex
@dataset{{italian_legal_corpus_2026,
title={{Italian Legal Corpus}},
author={{Dossier Legal}},
year={{2026}},
url={{https://huggingface.co/datasets/dossier-legal/italian-legal-corpus}}
}}
```
提供机构:
dossier-legal



