five

dossier-legal/italian-legal-corpus

收藏
Hugging Face2026-03-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dossier-legal/italian-legal-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - it license: cc-by-4.0 task_categories: - text-generation - text-classification tags: - legal - italian - legislation - court-decisions - eu-law pretty_name: Italian Legal Corpus size_categories: - 100K<n<1M --- # Italian Legal Corpus A comprehensive corpus of Italian legal texts from 4 open-data sources, designed for training and evaluating legal NLP models. ## Sources | Source | Description | Documents | |--------|-------------|-----------| | **Normattiva** | All Italian national legislation (1861-2026) | ~300K | | **Corte Costituzionale** | Constitutional Court decisions (1956-2026) | ~18K | | **OpenGA** | Administrative justice metadata | ~100K | | **EUR-Lex** | EU legislation in Italian | ~50K | ## Schema Each record contains: | Field | Type | Description | |-------|------|-------------| | `id` | string | Globally unique ID (`{{source}}_{{specific_id}}`) | | `source` | string | One of: normattiva, corte_costituzionale, openga, eurlex | | `doc_type` | string | Document type (legislation, decision, regulation, etc.) | | `title` | string | Human-readable title | | `date` | string? | ISO 8601 date (YYYY-MM-DD) | | `text` | string | Full cleaned text | | `authority` | string? | Issuing authority | | `number` | string? | Document number | | `year` | int? | Publication year | | `ecli` | string? | ECLI identifier (court decisions) | | `text_length` | int | Character count of text field | | `language` | string | Always "it" | ## License The underlying legal texts are public domain (Italian law, EU law, court decisions). This dataset compilation is released under CC-BY-4.0. ## Citation ```bibtex @dataset{{italian_legal_corpus_2026, title={{Italian Legal Corpus}}, author={{Dossier Legal}}, year={{2026}}, url={{https://huggingface.co/datasets/dossier-legal/italian-legal-corpus}} }} ```
提供机构:
dossier-legal
二维码
社区交流群
二维码
科研交流群
商业服务