five

horelulus/ID_REG_QA_Large

收藏
Hugging Face2026-03-28 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/horelulus/ID_REG_QA_Large
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # 🧾 Indonesian Legal QA Dataset This repository contains a **question-answer (QA) dataset** generated from parsed Indonesian regulations, focusing on **legal quoting and comprehension**. Designed to facilitate legal-aware LLMs, the dataset provides direct QA mappings to individual articles for contextual understanding and reference. --- ## 📌 Dataset Highlights * **Source**: Generated from the [ID\_REG\_Parsed](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) repository * **Format**: QA pairs based on individual articles (no chunking) * **Scale**: Augmented by applying 50 QA templates across suitable regulation entries * **Filtering**: Programmatic filtering removes redundant or overly broad article explanations * **Target Use**: Train/test LLMs for **regulation comprehension**, **legal quoting**, and **document-level QA** --- ## ⚙️ Pipeline Overview * **Environment**: Executed in a single Jupyter Notebook on **Kaggle Cloud** * **Data Flow**: 1. **Pull** parsed articles from `ID_REG_Parsed` 2. Filter and refine results for clarity and legal context 3. Apply **template-driven QA generation** (50 variations) 4. **Push** QA dataset directly to this repository * **Performance**: * Completed in \~1 hour using Kaggle GPU resources * Cloud-to-cloud transfer without local storage dependency --- ## 🧠 Use Cases * Fine-tuning LLMs for **legal question answering** * Benchmarks for **article referencing and quoting** * Few-shot prompting for legal search assistants * Legal text evaluation with grounded answers --- ## ⚠️ Disclaimer This dataset is intended for **research and development** only. QA pairs are generated synthetically from publicly available legal text and may not reflect official interpretations. --- ## 🙏 Acknowledgments * **[Hugging Face](https://huggingface.co/)** for hosting open datasets * **[Kaggle](https://www.kaggle.com/)** for compute and cloud-to-cloud capabilities ---

--- 许可证:Apache 2.0 --- # 🧾 印尼法律问答(QA)数据集 本仓库包含一个**问答(QA)数据集**,该数据集生成自解析后的印尼法律法规,聚焦**法律引用与文本理解**任务。本数据集旨在助力具备法律认知能力的大语言模型(Large Language Model,LLM),提供了与单一条款直接对应的问答映射,便于上下文理解与参考引用。 --- ## 📌 数据集核心亮点 * **数据来源**:源自 [`ID_REG_Parsed`](https://huggingface.co/datasets/Azzindani/ID_REG_Parsed) 仓库 * **数据格式**:基于单一条款生成的问答对(未进行分块处理) * **数据规模**:针对适配的法规条目,通过50套问答模板完成数据增强 * **数据筛选**:通过程序化过滤移除冗余或过于宽泛的条款解释 * **目标用途**:用于训练/测试大语言模型,以完成**法规理解**、**法律引用**以及**文档级问答**任务 --- ## ⚙️ 数据处理流程概览 * **运行环境**:在**Kaggle云平台**的单Jupyter Notebook中执行 * **数据流转流程**: 1. 从`ID_REG_Parsed`拉取解析后的条款文本 2. 对结果进行筛选与优化,确保表述清晰且符合法律语境 3. 应用**模板驱动的问答生成**(共50种变体) 4. 将生成的问答数据集直接推送至本仓库 * **运行性能**: * 使用Kaggle GPU资源,总耗时约1小时 * 采用云到云传输,无需依赖本地存储 --- ## 🧠 应用场景 * 面向**法律问答任务**的大语言模型微调 * 用于**条款引用与法律引用**能力的基准测试 * 适配法律搜索助手的少样本提示工程 * 基于锚定答案的法律文本评估 --- ## ⚠️ 免责声明 本数据集仅用于**研发与学术研究**。问答对均由公开可用的法律文本合成生成,未必反映官方解读。 --- ## 🙏 致谢 * **[Hugging Face](https://huggingface.co/)** 提供开源数据集托管服务 * **[Kaggle](https://www.kaggle.com/)** 提供计算资源与云到云传输能力
提供机构:
horelulus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作