oliverkinch/eur-lex
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/oliverkinch/eur-lex
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- da
- en
task_categories:
- translation
pretty_name: EUR-Lex EN–DA
---
# EUR-Lex EN–DA (Parallel Legal Text)
A parallel corpus of EU legal documents in English and Danish. Contains only samples where both languages are present.
## Dataset Structure
### Features
| Field | Type | Description |
|-------|------|-------------|
| `celex` | string | CELEX document identifier |
| `resource_type` | string | Type of legal document (`caselaw`, `decision`, `directive`, `intagr`, `recommendation`, `regulation`) |
| `url` | string | Source URL |
| `title_en` | string | English title |
| `title_da` | string | Danish title |
| `text_en` | string | English text |
| `text_da` | string | Danish text |
| `text_source_en` | string | English text source format (`html` or `pdf`) |
| `text_source_da` | string | Danish text source format (`html` or `pdf`) |
| `chars_en` | int64 | English character count |
| `chars_da` | int64 | Danish character count |
## Statistics
| Document type | Samples |
|---|---|
| caselaw | 102,298 |
| decision | 44,884 |
| directive | 4,361 |
| intagr | 13,328 |
| regulation | 124,488 |
| recommendation | 3,434 |
| **Total** | **292,793** |
## Data Splits
Single `train` split containing all document types.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("oliverkinch/eur-lex", split="train")
```
提供机构:
oliverkinch



