five

boun-tabilab/turkish_parliamentary_data

收藏
Hugging Face2026-03-30 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/boun-tabilab/turkish_parliamentary_data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - tr - ota pretty_name: Grand National Assembly Corpus of Türkiye (GNACT) size_categories: - 1M<n<10M task_categories: - text-generation - fill-mask - text-classification - question-answering tags: - parliamentary - turkish - ottoman-turkish - historical - political-discourse - ocr configs: - config_name: full_sessions data_files: - split: train path: full_sessions/*.parquet - config_name: pages data_files: - split: train path: pages/*.parquet - config_name: tbmm_only data_files: - split: train path: tbmm_only/*.parquet --- # Grand National Assembly Corpus of Türkiye (GNACT) A comprehensive collection of Turkish parliamentary transcripts spanning over 100 years (1920–present), from 10 legislative bodies. Includes both Ottoman Turkish (1920–1928) and Modern Turkish (1928–present) texts. ## Loading the dataset ```python from datasets import load_dataset # Strategy 1: full session documents, all bodies (default) ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "full_sessions", split="train") # Strategy 2: page-level, all bodies ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "pages", split="train") # Strategy 3: TBMM only, all terms ds = load_dataset("boun-tabilab/turkish_parliamentary_data", "tbmm_only", split="train") # Filter after loading (e.g. TBMM term 22, year 3) ds_t22 = ds.filter(lambda x: x["term"] == 22) ds_t22_y3 = ds.filter(lambda x: x["term"] == 22 and x["year"] == 3) ``` ## Configs | Config | Granularity | Bodies | Shards | |---|---|---|---| | `full_sessions` | Document | All 10 | 27 | | `pages` | Page | All 10 | 20 | | `tbmm_only` | Document | TBMM only | 16 | ## Columns | Column | Type | Description | |---|---|---| | `document_id` | string | Unique document identifier | | `legislative_body` | string | Legislative body name | | `term` | int32 | Legislative term number | | `year` | int32 | Year within the term | | `session` | int32 | Session number | | `volume` | int32 | Volume number | | `page_num` | int32 | Page number (0 for full-session rows) | | `text` | string | OCR-extracted text | | `is_full_session` | bool | True for document rows, False for page rows | | `language` | string | `tr` (Modern Turkish) or `ota` (Ottoman Turkish) | ## Legislative bodies | Code | Name | |---|---| | TBMM | Türkiye Büyük Millet Meclisi (Grand National Assembly) | | MM | Millet Meclisi (National Assembly) | | CS | Cumhuriyet Senatosu (Senate of the Republic) | | BT | Birleşik Toplantı (Joint Sessions) | | MGK | Milli Güvenlik Konseyi (National Security Council) | | MBK | Milli Birlik Komitesi (National Unity Committee) | | KM | Kurucu Meclis (Constituent Assembly) | | DM | Danışma Meclisi (Advisory Council) | | TM | Temsilciler Meclisi (House of Representatives) | | GC | Gizli Celse / Kapalı Oturum (Secret/Closed Sessions) | ## License [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) ## Citation ```bibtex @dataset{turkish_parliamentary_data_2026, title = {Turkish Parliamentary Data 2026: Grand National Assembly Corpus of Türkiye (GNACT)}, year = {2026}, url = {https://huggingface.co/datasets/boun-tabilab/turkish_parliamentary_data}, license = {CC BY-NC-SA 4.0} } ```
提供机构:
boun-tabilab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作