lumees/turkish-legislation-corpus
收藏Hugging Face2025-11-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/lumees/turkish-legislation-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- tr
license: cc-by-4.0
tags:
- legal
- law
- turkey
- legislation
- nlp
task_categories:
- text-classification
- text-generation
pretty_name: Turkish Legislation Corpus
size_categories:
- 1K<n<10K
---
# Turkish Legislation Dataset
This dataset contains Turkish legislative texts (laws) scraped from the official **Mevzuat Bilgi Sistemi (mevzuat.gov.tr)**. The raw data has been cleaned, stripped of unnecessary whitespace, and structured into the JSONL format.
## Dataset Contents
The dataset includes the full text of laws currently in force in the Republic of Turkey, along with relevant metadata. It is suitable for NLP tasks in the legal domain, such as LLM fine-tuning, RAG (Retrieval-Augmented Generation) systems, and text classification.
### Data Format (JSONL)
Each line in the file is a JSON object containing the following fields:
| Field | Type | Description |
| :--- | :--- | :--- |
| `url` | string | The original source URL of the legislation. |
| `text` | string | The cleaned text of the law. Line breaks and broken whitespace have been fixed. |
| `law_number` | int/null | The specific Law Number (extracted via Regex). |
| `acceptance_date` | string | The date the law was accepted (DD/MM/YYYY). |
| `source` | string | Source of the data (mevzuat.gov.tr). |
### Example Data
```json
{
"url": "[https://www.mevzuat.gov.tr/](https://www.mevzuat.gov.tr/)...",
"text": "LAW AMENDING CERTAIN LAWS REGARDING HEALTH... Law Number : 7557...",
"law_number": 7557,
"acceptance_date": "21/7/2025",
"source": "mevzuat.gov.tr"
}
````
## Use Cases
* **Legal LLM:** Training or fine-tuning language models on formal Turkish legal terminology.
* **Semantic Search:** Building vector-based search engines for legal research.
* **Summarization:** Automating the summarization of long legislative texts.
## Disclaimer & Licensing
This dataset was compiled from public official sources. According to **Article 31 of the Turkish Law on Intellectual and Artistic Works (FSEK)**, official texts such as laws, decrees, and regulations are not subject to copyright and are in the public domain.
**Legal Disclaimer:** This dataset is for informational and research purposes only. It is **not** an official document. For official legal procedures, please refer to the Official Gazette (Resmi Gazete) or mevzuat.gov.tr.
```
---
语言:
- 土耳其语(tr)
许可协议:CC-BY-4.0
标签:
- 法律领域
- 法律
- 土耳其
- 立法
- 自然语言处理(Natural Language Processing,NLP)
任务类别:
- 文本分类
- 文本生成
友好名称:土耳其立法语料库
数据规模:1K<n<10K
---
# 土耳其立法数据集
本数据集包含从官方**Mevzuat Bilgi Sistemi(mevzuat.gov.tr)**爬取的土耳其立法文本(法律条文)。原始数据已完成清洗,去除多余空白,并结构化为JSONL格式。
## 数据集内容
本数据集涵盖土耳其共和国现行有效法律的完整文本及相关元数据,适用于法律领域的自然语言处理任务,例如大语言模型(Large Language Model,LLM)微调、检索增强生成(Retrieval-Augmented Generation,RAG)系统构建与文本分类等。
### 数据格式(JSONL)
文件中的每一行均为一个JSON对象,包含以下字段:
| 字段名 | 数据类型 | 字段说明 |
| :--- | :--- | :--- |
| `url` | 字符串 | 该立法文本的原始来源URL |
| `text` | 字符串 | 经清洗后的法律文本,已修复换行符与异常空白格式 |
| `law_number` | 整数/空值 | 具体法律编号(通过正则表达式提取) |
| `acceptance_date` | 字符串 | 法律通过日期(格式:日/月/年) |
| `source` | 字符串 | 数据来源(mevzuat.gov.tr) |
### 示例数据
json
{
"url": "[https://www.mevzuat.gov.tr/](https://www.mevzuat.gov.tr/)...",
"text": "LAW AMENDING CERTAIN LAWS REGARDING HEALTH... Law Number : 7557...",
"law_number": 7557,
"acceptance_date": "21/7/2025",
"source": "mevzuat.gov.tr"
}
## 应用场景
* **法律领域大语言模型:** 针对土耳其正式法律术语开展语言模型的训练与微调
* **语义搜索:** 构建面向法律研究的向量搜索引擎
* **文本摘要:** 实现长立法文本的自动化摘要生成
## 免责声明与许可说明
本数据集汇编自公开官方数据源。根据《土耳其知识产权与艺术作品法(FSEK)》第31条规定,法律、政令与规章等官方文本不受版权保护,属于公共领域内容。
**法律免责声明:** 本数据集仅用于信息参考与研究用途,并非官方文档。如需办理正式法律程序,请参阅《官方公报(Resmi Gazete)》或mevzuat.gov.tr官网。
提供机构:
lumees



