nibzard/narodne-novine-metadata-graph

Name: nibzard/narodne-novine-metadata-graph
Creator: nibzard
Published: 2026-03-29 17:17:46
License: 暂无描述

Hugging Face2026-03-29 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/nibzard/narodne-novine-metadata-graph

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Narodne Novine Metadata Graph language: - hr license: other annotations_creators: - machine-generated source_datasets: - original size_categories: - 100K<n<1M task_categories: - other tags: - law - legislation - government-documents - croatian - metadata --- # Narodne Novine Metadata Graph Structured metadata snapshot of the Croatian official gazette archive mirrored from `narodne-novine.nn.hr`. ## Coverage - Years: `1990-2026` - Issues: `5031` - Acts: `97012` - Graph links: `37183` ## Files - `acts.parquet` - `issues.parquet` - `act_links.parquet` - `year_indexes.parquet` - `indexes/<year>.csv` - `indexes/<year>.xlsx` - `subjects.parquet` - `act_subjects.parquet` - `metadata.json` ## Notes - `2015+` records come from NN API / ELI metadata. - `1990-2014` records come from legacy search and yearly index exports. - Yearly index binaries are included under `indexes/` so local imports can restore them exactly. - This dataset stores metadata, links, and source URLs. It does not provide article-level legal consolidation. - PDF links are preserved when NN exposes them. Many records are HTML-only by source design. ## Tables - `acts.parquet`: one row per act with identifiers, titles, dates, source links, issuer/type IRIs, amendment resolution fields, and raw JSON-LD payloads as JSON strings - `issues.parquet`: one row per issue with ordered act-number lists and crawl status - `act_links.parquet`: explicit ELI graph links such as `amends`, `changes`, `repeals`, `based_on` - `subjects.parquet`: unique legal subject IRIs extracted from JSON-LD - `act_subjects.parquet`: many-to-many join between acts and extracted subjects - `year_indexes.parquet`: fetched yearly CSV/XLSX index metadata ## Caveats - This dataset is a derived mirror of public official-publication metadata, not the canonical legal source. - Some `1990-2014` fields are inferred from legacy HTML and yearly indexes rather than ELI-native structured metadata. - `resolved_target_eli` identifies target documents, not article-level legal diffs. - `passed_by_iri` is strongest for `2015+`; many legacy issuer IRIs are historically normalized from issuer text. - `raw_jsonld_json` is empty for most legacy records because the legacy site does not expose equivalent JSON-LD. ## Top Document Types - `ODLUKA`: 38011 - `RJESENJE`: 23368 - `PRAVILNIK`: 14634 - `OSTALO`: 6519 - `ZAKON`: 4563 - `UREDBA`: 4247 - `NAREDBA`: 1673 - `PRESUDA`: 1230 - `UPUTA`: 867 - `IZMJENE_I_DOPUNE`: 669 ## Example ```python from datasets import load_dataset acts = load_dataset("parquet", data_files="acts.parquet")["train"] print(acts[0]["title"]) ``` ## Provenance Source website: `https://narodne-novine.nn.hr` This is a derived metadata/graph mirror built from public NN endpoints, yearly indexes, and legacy article pages. Downstream users should review source-site terms, preserve source attribution, and verify legal-critical interpretations against the official publication. ## Intended Use - research and corpus analysis - legal-document discovery - citation and amendment-graph exploration - dataset prototyping for Croatian legislation metadata ## Not Intended Use - treating this dataset as the sole authoritative legal source - deriving article-level consolidated law without additional verification - legal advice or compliance decisions without checking the official publication

提供机构：

nibzard

5,000+

优质数据集

54 个

任务类型

进入经典数据集