boun-tabilab/turkish_parliamentary_data
收藏Hugging Face2026-03-30 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/boun-tabilab/turkish_parliamentary_data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
language:
- tr
- ota
pretty_name: Grand National Assembly Corpus of Türkiye (GNACT)
size_categories:
- 1M<n<10M
task_categories:
- text-generation
- fill-mask
- text-classification
- question-answering
tags:
- parliamentary
- turkish
- ottoman-turkish
- historical
- political-discourse
- ocr
configs:
- config_name: full_sessions
data_files:
- split: train
path: full_sessions/*.parquet
- config_name: pages
data_files:
- split: train
path: pages/*.parquet
- config_name: tbmm_only
data_files:
- split: train
path: tbmm_only/*.parquet
---
# Grand National Assembly Corpus of Türkiye (GNACT)
A comprehensive collection of Turkish parliamentary transcripts spanning over 100 years (1920–present), from 10 legislative bodies. Includes both Ottoman Turkish (1920–1928) and Modern Turkish (1928–present) texts.
## Loading the dataset
```python
from datasets import load_dataset
# Strategy 1: full session documents, all bodies (default)
ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "full_sessions", split="train")
# Strategy 2: page-level, all bodies
ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "pages", split="train")
# Strategy 3: TBMM only, all terms
ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "tbmm_only", split="train")
# Filter after loading (e.g. TBMM term 22, year 3)
ds_t22 = ds.filter(lambda x: x["term"] == 22)
ds_t22_y3 = ds.filter(lambda x: x["term"] == 22 and x["year"] == 3)
```
## Configs
| Config | Granularity | Bodies | Shards |
|---|---|---|---|
| `full_sessions` | Document | All 10 | 27 |
| `pages` | Page | All 10 | 20 |
| `tbmm_only` | Document | TBMM only | 16 |
## Columns
| Column | Type | Description |
|---|---|---|
| `document_id` | string | Unique document identifier |
| `legislative_body` | string | Legislative body name |
| `term` | int32 | Legislative term number |
| `year` | int32 | Year within the term |
| `session` | int32 | Session number |
| `volume` | int32 | Volume number |
| `page_num` | int32 | Page number (0 for full-session rows) |
| `text` | string | OCR-extracted text |
| `is_full_session` | bool | True for document rows, False for page rows |
| `language` | string | `tr` (Modern Turkish) or `ota` (Ottoman Turkish) |
## Legislative bodies
| Code | Name |
|---|---|
| TBMM | Türkiye Büyük Millet Meclisi (Grand National Assembly) |
| MM | Millet Meclisi (National Assembly) |
| CS | Cumhuriyet Senatosu (Senate of the Republic) |
| BT | Birleşik Toplantı (Joint Sessions) |
| MGK | Milli Güvenlik Konseyi (National Security Council) |
| MBK | Milli Birlik Komitesi (National Unity Committee) |
| KM | Kurucu Meclis (Constituent Assembly) |
| DM | Danışma Meclisi (Advisory Council) |
| TM | Temsilciler Meclisi (House of Representatives) |
| GC | Gizli Celse / Kapalı Oturum (Secret/Closed Sessions) |
## License
[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
## Citation
```bibtex
@dataset{turkish_parliamentary_data_2026,
title = {Turkish Parliamentary Data 2026: Grand National Assembly Corpus of Türkiye (GNACT)},
year = {2026},
url = {https://huggingface.co/datasets/boun-tabilab/turkish_parliamentary_data},
license = {CC BY-NC-SA 4.0}
}
```
提供机构:
boun-tabilab



