horelulus/ID_REG_QA_Large
收藏Hugging Face2026-03-28 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/horelulus/ID_REG_QA_Large
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# 🧾 Indonesian Legal QA Dataset
This repository contains a **question-answer (QA) dataset** generated from parsed Indonesian regulations, focusing on **legal quoting and comprehension**. Designed to facilitate legal-aware LLMs, the dataset provides direct QA mappings to individual articles for contextual understanding and reference.
---
## 📌 Dataset Highlights
* **Source**: Generated from the [ID\_REG\_Parsed](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) repository
* **Format**: QA pairs based on individual articles (no chunking)
* **Scale**: Augmented by applying 50 QA templates across suitable regulation entries
* **Filtering**: Programmatic filtering removes redundant or overly broad article explanations
* **Target Use**: Train/test LLMs for **regulation comprehension**, **legal quoting**, and **document-level QA**
---
## ⚙️ Pipeline Overview
* **Environment**: Executed in a single Jupyter Notebook on **Kaggle Cloud**
* **Data Flow**:
1. **Pull** parsed articles from `ID_REG_Parsed`
2. Filter and refine results for clarity and legal context
3. Apply **template-driven QA generation** (50 variations)
4. **Push** QA dataset directly to this repository
* **Performance**:
* Completed in \~1 hour using Kaggle GPU resources
* Cloud-to-cloud transfer without local storage dependency
---
## 🧠 Use Cases
* Fine-tuning LLMs for **legal question answering**
* Benchmarks for **article referencing and quoting**
* Few-shot prompting for legal search assistants
* Legal text evaluation with grounded answers
---
## ⚠️ Disclaimer
This dataset is intended for **research and development** only. QA pairs are generated synthetically from publicly available legal text and may not reflect official interpretations.
---
## 🙏 Acknowledgments
* **[Hugging Face](https://huggingface.co/)** for hosting open datasets
* **[Kaggle](https://www.kaggle.com/)** for compute and cloud-to-cloud capabilities
---
---
许可证:Apache 2.0
---
# 🧾 印尼法律问答(QA)数据集
本仓库包含一个**问答(QA)数据集**,该数据集生成自解析后的印尼法律法规,聚焦**法律引用与文本理解**任务。本数据集旨在助力具备法律认知能力的大语言模型(Large Language Model,LLM),提供了与单一条款直接对应的问答映射,便于上下文理解与参考引用。
---
## 📌 数据集核心亮点
* **数据来源**:源自 [`ID_REG_Parsed`](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) 仓库
* **数据格式**:基于单一条款生成的问答对(未进行分块处理)
* **数据规模**:针对适配的法规条目,通过50套问答模板完成数据增强
* **数据筛选**:通过程序化过滤移除冗余或过于宽泛的条款解释
* **目标用途**:用于训练/测试大语言模型,以完成**法规理解**、**法律引用**以及**文档级问答**任务
---
## ⚙️ 数据处理流程概览
* **运行环境**:在**Kaggle云平台**的单Jupyter Notebook中执行
* **数据流转流程**:
1. 从`ID_REG_Parsed`拉取解析后的条款文本
2. 对结果进行筛选与优化,确保表述清晰且符合法律语境
3. 应用**模板驱动的问答生成**(共50种变体)
4. 将生成的问答数据集直接推送至本仓库
* **运行性能**:
* 使用Kaggle GPU资源,总耗时约1小时
* 采用云到云传输,无需依赖本地存储
---
## 🧠 应用场景
* 面向**法律问答任务**的大语言模型微调
* 用于**条款引用与法律引用**能力的基准测试
* 适配法律搜索助手的少样本提示工程
* 基于锚定答案的法律文本评估
---
## ⚠️ 免责声明
本数据集仅用于**研发与学术研究**。问答对均由公开可用的法律文本合成生成,未必反映官方解读。
---
## 🙏 致谢
* **[Hugging Face](https://huggingface.co/)** 提供开源数据集托管服务
* **[Kaggle](https://www.kaggle.com/)** 提供计算资源与云到云传输能力
提供机构:
horelulus



