labrat-aiko/popia-compliance-nli
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/labrat-aiko/popia-compliance-nli
下载链接
链接失效反馈官方服务:
资源简介:
POPIA合规性自然语言推理数据集是一个专门为南非《个人信息保护法》(POPIA)设计的自然语言推理(NLI)数据集。该数据集包含手工编写的(前提、假设、标签)三元组,涵盖了7个典型的POPIA条款。旨在用于微调小型交叉编码器NLI模型,使其成为确定性的、可审计的合规性判断工具。数据集包括训练集、验证集和测试集,分别包含180、120和150个样本。每个样本都标注了条款、前提、假设、标签和场景。数据集适用于本地、确定性的POPIA验证任务,也可用于评估现有NLI或LLM-as-judge系统在POPIA合规性任务上的表现。数据集仅包含英文内容,且测试集与训练集由同一作者手工编写,可能存在风格相关性。
The POPIA Compliance NLI dataset is a natural language inference (NLI) dataset specifically designed for South Africas Protection of Personal Information Act (POPIA). It consists of hand-authored (premise, hypothesis, label) triples covering 7 canonical POPIA clauses. The dataset is intended for fine-tuning small cross-encoder NLI models to act as deterministic, auditable compliance judges. It includes train, validation, and test splits with 180, 120, and 150 samples respectively. Each example is annotated with clause, premise, hypothesis, label, and scenario. The dataset is suitable for local, deterministic POPIA validation tasks and can also be used to benchmark existing NLI or LLM-as-judge systems on POPIA compliance tasks. The dataset is English-only, and the test set is hand-authored by the same author as the train set, which may introduce stylistic correlations.
提供机构:
labrat-aiko



