olzhasAl/adilet-legal-qa-kz
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/olzhasAl/adilet-legal-qa-kz
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ru
- kk
license: apache-2.0
task_categories:
- question-answering
- text-generation
tags:
- legal
- kazakhstan
- doc-to-lora
- hypernetwork
- legislation
- QA
- kazakh
- russian
- adilet
- NPA
pretty_name: "Adilet Legal QA — Kazakhstan Legislation Dataset"
size_categories:
- 10K<n<50K
---
# Adilet Legal QA — Kazakhstan Legislation Dataset
<div align="center">
<h3>🇰🇿 13,159 Question-Answer pairs from Kazakhstan's legal system</h3>
<p>Russian & Kazakh | 28 Laws & Codes | Built for Doc-to-LoRA</p>
</div>
## Overview
**Adilet Legal QA** is a dataset of 13,159 question-answer pairs generated from 28 key normative legal acts (НПА) of the Republic of Kazakhstan. The dataset covers the Constitution, major codes (Civil, Land, Budget), and key laws (Education, Banking, Pensions, etc.) in both Russian and Kazakh languages.
This dataset was created for training and evaluating [Doc-to-LoRA](https://github.com/SakanaAI/doc-to-lora) hypernetworks — neural networks that generate LoRA adapters on the fly, enabling instant document internalization into LLMs.
## Dataset Statistics
| Metric | Value |
|--------|-------|
| Total QA pairs | 13,159 |
| Documents (НПА) | 28 |
| Languages | Russian, Kazakh |
| Total text volume | ~8.9M characters |
| Unique contexts (chunks) | 1,724 |
| Avg. QA per chunk | ~7.6 |
## Documents Covered
### Constitution
- Конституция Республики Казахстан / Қазақстан Республикасының Конституциясы
### Codes (Кодексы)
- Гражданский кодекс РК (Общая часть)
- Земельный кодекс РК
- Бюджетный кодекс РК
### Laws (Законы)
- Закон РК «Об образовании»
- Закон РК «О пенсионном обеспечении»
- Закон РК «О банках и банковской деятельности»
- Закон РК «О воинской службе и статусе военнослужащих»
- Закон РК «О дорожном движении»
- Закон РК «О долевом участии в жилищном строительстве»
- Закон РК «О хозяйственных товариществах»
### Constitutional Laws
- Конституционный закон «О Президенте РК»
- Конституционный закон «О Парламенте РК»
- Конституционный закон «О судебной системе и статусе судей»
## Data Format
Each sample contains:
```json
{
"context": "Статья 10.\n1. Гражданство Республики Казахстан приобретается...",
"prompts": [
"Может ли гражданин Казахстана иметь двойное гражданство?",
"В каких случаях допускается лишение гражданства?"
],
"responses": [
"Нет, согласно пункту 3 статьи 10 Конституции, за гражданином Республики не признается гражданство другого государства.",
"Лишение гражданства допускается лишь по решению суда за совершение террористических преступлений..."
]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `context` | string | Text chunk from a legal document (НПА) |
| `prompts` | list[string] | Questions about the context |
| `responses` | list[string] | Answers with references to specific articles |
## Intended Use
### Primary Use: Doc-to-LoRA Training
This dataset is designed for training hypernetworks that generate document-specific LoRA adapters. The hypernetwork learns to internalize legal documents so an LLM can answer questions without RAG or in-context learning.
### Other Uses
- Fine-tuning LLMs for Kazakhstan legal domain
- Legal QA benchmarking for Russian/Kazakh languages
- Retrieval-Augmented Generation (RAG) evaluation
- Legal NLP research for Central Asian languages
## Data Generation Pipeline
```
adilet.zan.kz (official legal database)
│
▼
Web scraping of 28 key НПА (Russian + Kazakh)
│
▼
Chunking by articles (~6000 chars per chunk)
│
▼
QA generation via LLM API (5-8 QA per chunk)
│
▼
JSON validation & formatting
│
▼
Parquet export (1,724 samples, 13,159 QA pairs)
```
QA pairs were generated using a large language model prompted with specialized legal instructions to ensure:
- Questions cover factual, procedural, and analytical aspects
- Answers are strictly grounded in the source text
- References to specific articles and clauses are included
## Connection to Doc-to-LoRA
This dataset is part of an ongoing project applying [Sakana AI's Doc-to-LoRA](https://arxiv.org/abs/2602.15902) technology to Kazakhstan's legal system. The goal is to build a chatbot that can instantly internalize any legal document and answer questions about it.
**Current status:**
- Doc-to-LoRA pipeline validated (PoC with Qwen3-4B)
- Dataset collected (this dataset)
- Hypernetwork training in progress (20K steps on Qwen3-4B)
- Planned: self-gen data with logprobs for domain-specific training
## Source
All legal texts are sourced from [adilet.zan.kz](https://adilet.zan.kz) — the official legal information system of the Republic of Kazakhstan, maintained by the Ministry of Justice. The system contains the complete collection of normative legal acts of Kazakhstan in both state languages.
## Limitations
- Not all НПА IDs were successfully resolved (some returned 404); coverage will be expanded
- QA quality depends on the generation model; manual verification is ongoing
- Some very long articles may be split across chunks, potentially losing cross-reference context
- Kazakh language QA quality may be lower than Russian due to model capabilities
## Citation
```bibtex
@dataset{alseitov2026adilet_legal_qa,
title={Adilet Legal QA: Kazakhstan Legislation Dataset for Doc-to-LoRA},
author={Alseitov, Olzhas},
year={2026},
url={https://huggingface.co/datasets/olzhasAl/adilet-legal-qa-kz},
note={13,159 QA pairs from 28 key legal acts of the Republic of Kazakhstan}
}
```
## Related Work
- [Doc-to-LoRA (Sakana AI)](https://arxiv.org/abs/2602.15902) — Hypernetworks for instant document internalization
- [Text-to-LoRA (Sakana AI)](https://arxiv.org/abs/2506.06105) — Task adaptation via natural language descriptions
- [adilet.zan.kz](https://adilet.zan.kz) — Official legal database of Kazakhstan
## Author
**Olzhas Alseitov**
- Telegram: [@olzhasAl](https://t.me/olzhasAl)
- LinkedIn: [olzhas-alseitov](https://www.linkedin.com/in/olzhas-alseitov/)
- GitHub: [InfiniteJas](https://github.com/InfiniteJas)
## License
This dataset is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The source legal texts are publicly available government documents from the Republic of Kazakhstan.
提供机构:
olzhasAl



