nibzard/narodne-novine-metadata-graph
收藏Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/nibzard/narodne-novine-metadata-graph
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Narodne Novine Metadata Graph
language:
- hr
license: other
annotations_creators:
- machine-generated
source_datasets:
- original
size_categories:
- 100K<n<1M
task_categories:
- other
tags:
- law
- legislation
- government-documents
- croatian
- metadata
---
# Narodne Novine Metadata Graph
Structured metadata snapshot of the Croatian official gazette archive mirrored from `narodne-novine.nn.hr`.
## Coverage
- Years: `1990-2026`
- Issues: `5031`
- Acts: `97012`
- Graph links: `37183`
## Files
- `acts.parquet`
- `issues.parquet`
- `act_links.parquet`
- `year_indexes.parquet`
- `indexes/<year>.csv`
- `indexes/<year>.xlsx`
- `subjects.parquet`
- `act_subjects.parquet`
- `metadata.json`
## Notes
- `2015+` records come from NN API / ELI metadata.
- `1990-2014` records come from legacy search and yearly index exports.
- Yearly index binaries are included under `indexes/` so local imports can restore them exactly.
- This dataset stores metadata, links, and source URLs. It does not provide article-level legal consolidation.
- PDF links are preserved when NN exposes them. Many records are HTML-only by source design.
## Tables
- `acts.parquet`: one row per act with identifiers, titles, dates, source links, issuer/type IRIs, amendment resolution fields, and raw JSON-LD payloads as JSON strings
- `issues.parquet`: one row per issue with ordered act-number lists and crawl status
- `act_links.parquet`: explicit ELI graph links such as `amends`, `changes`, `repeals`, `based_on`
- `subjects.parquet`: unique legal subject IRIs extracted from JSON-LD
- `act_subjects.parquet`: many-to-many join between acts and extracted subjects
- `year_indexes.parquet`: fetched yearly CSV/XLSX index metadata
## Caveats
- This dataset is a derived mirror of public official-publication metadata, not the canonical legal source.
- Some `1990-2014` fields are inferred from legacy HTML and yearly indexes rather than ELI-native structured metadata.
- `resolved_target_eli` identifies target documents, not article-level legal diffs.
- `passed_by_iri` is strongest for `2015+`; many legacy issuer IRIs are historically normalized from issuer text.
- `raw_jsonld_json` is empty for most legacy records because the legacy site does not expose equivalent JSON-LD.
## Top Document Types
- `ODLUKA`: 38011
- `RJESENJE`: 23368
- `PRAVILNIK`: 14634
- `OSTALO`: 6519
- `ZAKON`: 4563
- `UREDBA`: 4247
- `NAREDBA`: 1673
- `PRESUDA`: 1230
- `UPUTA`: 867
- `IZMJENE_I_DOPUNE`: 669
## Example
```python
from datasets import load_dataset
acts = load_dataset("parquet", data_files="acts.parquet")["train"]
print(acts[0]["title"])
```
## Provenance
Source website: `https://narodne-novine.nn.hr`
This is a derived metadata/graph mirror built from public NN endpoints, yearly indexes, and legacy article pages. Downstream users should review source-site terms, preserve source attribution, and verify legal-critical interpretations against the official publication.
## Intended Use
- research and corpus analysis
- legal-document discovery
- citation and amendment-graph exploration
- dataset prototyping for Croatian legislation metadata
## Not Intended Use
- treating this dataset as the sole authoritative legal source
- deriving article-level consolidated law without additional verification
- legal advice or compliance decisions without checking the official publication
提供机构:
nibzard



