horelulus/ID_REG_QA_Large

Name: horelulus/ID_REG_QA_Large
Creator: horelulus
Published: 2026-03-28 08:55:15
License: 暂无描述

Hugging Face2026-03-28 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/horelulus/ID_REG_QA_Large

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # 🧾 Indonesian Legal QA Dataset This repository contains a **question-answer (QA) dataset** generated from parsed Indonesian regulations, focusing on **legal quoting and comprehension**. Designed to facilitate legal-aware LLMs, the dataset provides direct QA mappings to individual articles for contextual understanding and reference. --- ## 📌 Dataset Highlights * **Source**: Generated from the [ID\_REG\_Parsed](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) repository * **Format**: QA pairs based on individual articles (no chunking) * **Scale**: Augmented by applying 50 QA templates across suitable regulation entries * **Filtering**: Programmatic filtering removes redundant or overly broad article explanations * **Target Use**: Train/test LLMs for **regulation comprehension**, **legal quoting**, and **document-level QA** --- ## ⚙️ Pipeline Overview * **Environment**: Executed in a single Jupyter Notebook on **Kaggle Cloud** * **Data Flow**: 1. **Pull** parsed articles from `ID_REG_Parsed` 2. Filter and refine results for clarity and legal context 3. Apply **template-driven QA generation** (50 variations) 4. **Push** QA dataset directly to this repository * **Performance**: * Completed in \~1 hour using Kaggle GPU resources * Cloud-to-cloud transfer without local storage dependency --- ## 🧠 Use Cases * Fine-tuning LLMs for **legal question answering** * Benchmarks for **article referencing and quoting** * Few-shot prompting for legal search assistants * Legal text evaluation with grounded answers --- ## ⚠️ Disclaimer This dataset is intended for **research and development** only. QA pairs are generated synthetically from publicly available legal text and may not reflect official interpretations. --- ## 🙏 Acknowledgments * **[Hugging Face](https://huggingface.co/)** for hosting open datasets * **[Kaggle](https://www.kaggle.com/)** for compute and cloud-to-cloud capabilities ---

--- 许可证：Apache 2.0 --- # 🧾 印尼法律问答（QA）数据集本仓库包含一个**问答（QA）数据集**，该数据集生成自解析后的印尼法律法规，聚焦**法律引用与文本理解**任务。本数据集旨在助力具备法律认知能力的大语言模型（Large Language Model，LLM），提供了与单一条款直接对应的问答映射，便于上下文理解与参考引用。 --- ## 📌 数据集核心亮点 * **数据来源**：源自 [`ID_REG_Parsed`](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) 仓库 * **数据格式**：基于单一条款生成的问答对（未进行分块处理） * **数据规模**：针对适配的法规条目，通过50套问答模板完成数据增强 * **数据筛选**：通过程序化过滤移除冗余或过于宽泛的条款解释 * **目标用途**：用于训练/测试大语言模型，以完成**法规理解**、**法律引用**以及**文档级问答**任务 --- ## ⚙️ 数据处理流程概览 * **运行环境**：在**Kaggle云平台**的单Jupyter Notebook中执行 * **数据流转流程**： 1. 从`ID_REG_Parsed`拉取解析后的条款文本 2. 对结果进行筛选与优化，确保表述清晰且符合法律语境 3. 应用**模板驱动的问答生成**（共50种变体） 4. 将生成的问答数据集直接推送至本仓库 * **运行性能**： * 使用Kaggle GPU资源，总耗时约1小时 * 采用云到云传输，无需依赖本地存储 --- ## 🧠 应用场景 * 面向**法律问答任务**的大语言模型微调 * 用于**条款引用与法律引用**能力的基准测试 * 适配法律搜索助手的少样本提示工程 * 基于锚定答案的法律文本评估 --- ## ⚠️ 免责声明本数据集仅用于**研发与学术研究**。问答对均由公开可用的法律文本合成生成，未必反映官方解读。 --- ## 🙏 致谢 * **[Hugging Face](https://huggingface.co/)** 提供开源数据集托管服务 * **[Kaggle](https://www.kaggle.com/)** 提供计算资源与云到云传输能力

提供机构：

horelulus

5,000+

优质数据集

54 个

任务类型

进入经典数据集