five

JonusNattapong/Ratchakitcha

收藏
Hugging Face2025-12-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/JonusNattapong/Ratchakitcha
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-retrieval - text-generation - question-answering language: - th pretty_name: Royal Gazette Thailand (Ratchakitcha) size_categories: - 1M<n<10M tags: - law - thailand - government - open-data dataset_info: - config_name: ocr configs: - config_name: ocr data_files: - split: train path: ocr/**/*.jsonl - config_name: meta data_files: - split: train path: meta/**/*.jsonl - config_name: pdf_recent data_files: - split: train path: pdf/**/*.pdf - config_name: pdf_archive data_files: - split: train path: zip/**/*.zip - config_name: subset_2020s data_files: - split: train path: - ocr/*/202*/*.jsonl - meta/202*/*.jsonl - config_name: subset_2025 data_files: - split: train path: - ocr/*/2025/*.jsonl - meta/2025/*.jsonl - config_name: subset_2010s data_files: - split: train path: - ocr/*/201*/*.jsonl - meta/201*/*.jsonl - config_name: subset_2000s data_files: - split: train path: - ocr/*/200*/*.jsonl - meta/200*/*.jsonl - config_name: subset_1990s data_files: - split: train path: - ocr/*/199*/*.jsonl - meta/199*/*.jsonl - config_name: subset_1980s data_files: - split: train path: - ocr/*/198*/*.jsonl - meta/198*/*.jsonl - config_name: subset_1970s data_files: - split: train path: - ocr/*/197*/*.jsonl - meta/197*/*.jsonl - config_name: subset_1960s data_files: - split: train path: - ocr/*/196*/*.jsonl - meta/196*/*.jsonl - config_name: subset_pre_1960 data_files: - split: train path: - ocr/*/18*/*.jsonl - ocr/*/19[0-5]*/*.jsonl - meta/18*/*.jsonl - meta/19[0-5]*/*.jsonl - config_name: subset_1885 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 18469 num_examples: 127 download_size: 8481 dataset_size: 18469 - config_name: subset_1886 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 57279 num_examples: 381 download_size: 17352 dataset_size: 57279 - config_name: subset_1887 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 108634 num_examples: 460 download_size: 36105 dataset_size: 108634 - config_name: subset_1888 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 113648 num_examples: 482 download_size: 39047 dataset_size: 113648 - config_name: subset_1889 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 103021 num_examples: 503 download_size: 31583 dataset_size: 103021 - config_name: subset_1890 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 86336 num_examples: 531 download_size: 24582 dataset_size: 86336 - config_name: subset_1891 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 80925 num_examples: 509 download_size: 24518 dataset_size: 80925 - config_name: subset_1892 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 102398 num_examples: 523 download_size: 30350 dataset_size: 102398 - config_name: subset_1893 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 90916 num_examples: 488 download_size: 25213 dataset_size: 90916 - config_name: subset_1894 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 109013 num_examples: 535 download_size: 29815 dataset_size: 109013 - config_name: subset_1895 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 133217 num_examples: 612 download_size: 40287 dataset_size: 133217 - config_name: subset_1896 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 235176 num_examples: 1262 download_size: 40315 dataset_size: 235176 - config_name: subset_1897 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 143461 num_examples: 752 download_size: 36350 dataset_size: 143461 - config_name: subset_1898 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 72538 num_examples: 411 download_size: 24338 dataset_size: 72538 - config_name: subset_1899 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 200856 num_examples: 856 download_size: 54377 dataset_size: 200856 - config_name: subset_1900 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 183554 num_examples: 812 download_size: 55269 dataset_size: 183554 - config_name: subset_1901 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 209093 num_examples: 952 download_size: 50910 dataset_size: 209093 - config_name: subset_1902 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 324905 num_examples: 1029 download_size: 102728 dataset_size: 324905 - config_name: subset_1903 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 410311 num_examples: 1362 download_size: 107352 dataset_size: 410311 - config_name: subset_1904 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 402340 num_examples: 1355 download_size: 100311 dataset_size: 402340 - config_name: subset_1905 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 364887 num_examples: 1345 download_size: 94329 dataset_size: 364887 - config_name: subset_1906 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 512303 num_examples: 1588 download_size: 121881 dataset_size: 512303 - config_name: subset_1907 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 517162 num_examples: 1524 download_size: 126860 dataset_size: 517162 - config_name: subset_1908 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 552090 num_examples: 1530 download_size: 157456 dataset_size: 552090 - config_name: subset_1909 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 562627 num_examples: 1875 download_size: 143952 dataset_size: 562627 - config_name: subset_1910 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 507162 num_examples: 1679 download_size: 136302 dataset_size: 507162 - config_name: subset_1911 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 351182 num_examples: 1455 download_size: 91096 dataset_size: 351182 - config_name: subset_1912 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 464765 num_examples: 1757 download_size: 116329 dataset_size: 464765 - config_name: subset_1913 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 473570 num_examples: 1896 download_size: 112656 dataset_size: 473570 - config_name: subset_1914 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 385095 num_examples: 1714 download_size: 90016 dataset_size: 385095 - config_name: subset_1915 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 425966 num_examples: 1895 download_size: 96006 dataset_size: 425966 - config_name: subset_1916 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 426814 num_examples: 1983 download_size: 91024 dataset_size: 426814 - config_name: subset_1917 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 480079 num_examples: 2112 download_size: 105839 dataset_size: 480079 - config_name: subset_1918 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 549481 num_examples: 2109 download_size: 120434 dataset_size: 549481 - config_name: subset_1919 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 496185 num_examples: 2304 download_size: 90112 dataset_size: 496185 - config_name: subset_1920 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 454606 num_examples: 2075 download_size: 85682 dataset_size: 454606 - config_name: subset_1921 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 477793 num_examples: 2090 download_size: 96782 dataset_size: 477793 - config_name: subset_1922 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 609690 num_examples: 2445 download_size: 127739 dataset_size: 609690 - config_name: subset_1923 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 788414 num_examples: 2628 download_size: 162534 dataset_size: 788414 - config_name: subset_1924 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 591168 num_examples: 2312 download_size: 126117 dataset_size: 591168 - config_name: subset_1925 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 579246 num_examples: 2265 download_size: 119488 dataset_size: 579246 - config_name: subset_1926 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 762313 num_examples: 2721 download_size: 163855 dataset_size: 762313 - config_name: subset_1927 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 726377 num_examples: 2603 download_size: 146041 dataset_size: 726377 - config_name: subset_1928 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 766769 num_examples: 2668 download_size: 154307 dataset_size: 766769 - config_name: subset_1929 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 777029 num_examples: 2713 download_size: 148729 dataset_size: 777029 - config_name: subset_1930 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 687106 num_examples: 2389 download_size: 136124 dataset_size: 687106 - config_name: subset_1931 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 759158 num_examples: 2677 download_size: 140340 dataset_size: 759158 - config_name: subset_1932 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 777098 num_examples: 2613 download_size: 145183 dataset_size: 777098 - config_name: subset_1933 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 689283 num_examples: 2311 download_size: 131107 dataset_size: 689283 - config_name: subset_1934 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 602079 num_examples: 1958 download_size: 124224 dataset_size: 602079 - config_name: subset_1935 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 546643 num_examples: 1754 download_size: 115395 dataset_size: 546643 - config_name: subset_1936 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 545410 num_examples: 1695 download_size: 117835 dataset_size: 545410 - config_name: subset_1937 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 540084 num_examples: 1703 download_size: 113180 dataset_size: 540084 - config_name: subset_1938 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 632906 num_examples: 2074 download_size: 139303 dataset_size: 632906 - config_name: subset_1939 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 679568 num_examples: 2168 download_size: 146973 dataset_size: 679568 - config_name: subset_1940 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 549770 num_examples: 1723 download_size: 119669 dataset_size: 549770 - config_name: subset_1941 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 793410 num_examples: 2516 download_size: 167757 dataset_size: 793410 - config_name: subset_1942 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 632887 num_examples: 2020 download_size: 137020 dataset_size: 632887 - config_name: subset_1943 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 675841 num_examples: 2150 download_size: 145797 dataset_size: 675841 - config_name: subset_1944 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 604697 num_examples: 1786 download_size: 122802 dataset_size: 604697 - config_name: subset_1945 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 516487 num_examples: 1585 download_size: 105871 dataset_size: 516487 - config_name: subset_1946 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 578206 num_examples: 1745 download_size: 118754 dataset_size: 578206 - config_name: subset_1947 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 797086 num_examples: 2405 download_size: 161749 dataset_size: 797086 - config_name: subset_1948 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1651439 num_examples: 5128 download_size: 283370 dataset_size: 1651439 - config_name: subset_1949 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1317543 num_examples: 3971 download_size: 253970 dataset_size: 1317543 - config_name: subset_1950 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1398930 num_examples: 4364 download_size: 270848 dataset_size: 1398930 - config_name: subset_1951 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1382228 num_examples: 4333 download_size: 264855 dataset_size: 1382228 - config_name: subset_1952 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1208085 num_examples: 3775 download_size: 236305 dataset_size: 1208085 - config_name: subset_1953 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1965820 num_examples: 6612 download_size: 341087 dataset_size: 1965820 - config_name: subset_1954 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1707445 num_examples: 5735 download_size: 290583 dataset_size: 1707445 - config_name: subset_1955 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1663640 num_examples: 5502 download_size: 289573 dataset_size: 1663640 - config_name: subset_1956 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2085548 num_examples: 6858 download_size: 357862 dataset_size: 2085548 - config_name: subset_1957 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2032705 num_examples: 6438 download_size: 340481 dataset_size: 2032705 - config_name: subset_1958 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1741448 num_examples: 5411 download_size: 294439 dataset_size: 1741448 - config_name: subset_1959 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2224867 num_examples: 6362 download_size: 324903 dataset_size: 2224867 - config_name: subset_1960 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1867181 num_examples: 5612 download_size: 295113 dataset_size: 1867181 - config_name: subset_1961 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2033098 num_examples: 5746 download_size: 307454 dataset_size: 2033098 - config_name: subset_1962 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2267474 num_examples: 6380 download_size: 335049 dataset_size: 2267474 - config_name: subset_1963 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2554831 num_examples: 7296 download_size: 366084 dataset_size: 2554831 - config_name: subset_1964 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1036564 num_examples: 2467 download_size: 189301 dataset_size: 1036564 - config_name: subset_1965 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1154704 num_examples: 2686 download_size: 208741 dataset_size: 1154704 - config_name: subset_1966 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1271265 num_examples: 2955 download_size: 230876 dataset_size: 1271265 - config_name: subset_1967 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 1476882 num_examples: 3613 download_size: 269897 dataset_size: 1476882 - config_name: subset_1968 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3012592 num_examples: 9399 download_size: 468049 dataset_size: 3012592 - config_name: subset_1969 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4070251 num_examples: 13363 download_size: 572374 dataset_size: 4070251 - config_name: subset_1970 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4202209 num_examples: 13729 download_size: 591266 dataset_size: 4202209 - config_name: subset_1971 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3170103 num_examples: 9846 download_size: 477912 dataset_size: 3170103 - config_name: subset_1972 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2050856 num_examples: 4941 download_size: 371366 dataset_size: 2050856 - config_name: subset_1973 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3541335 num_examples: 10270 download_size: 553169 dataset_size: 3541335 - config_name: subset_1974 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2551251 num_examples: 5321 download_size: 444621 dataset_size: 2551251 - config_name: subset_1975 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2231811 num_examples: 4871 download_size: 393133 dataset_size: 2231811 - config_name: subset_1976 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 10572256 num_examples: 34924 download_size: 1344469 dataset_size: 10572256 - config_name: subset_1977 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 7831947 num_examples: 24897 download_size: 999551 dataset_size: 7831947 - config_name: subset_1978 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 9260121 num_examples: 30375 download_size: 1156082 dataset_size: 9260121 - config_name: subset_1979 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 8481149 num_examples: 26862 download_size: 1100808 dataset_size: 8481149 - config_name: subset_1980 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2412078 num_examples: 5115 download_size: 423082 dataset_size: 2412078 - config_name: subset_1981 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2724043 num_examples: 5615 download_size: 507198 dataset_size: 2724043 - config_name: subset_1982 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2592421 num_examples: 5541 download_size: 466262 dataset_size: 2592421 - config_name: subset_1983 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2442938 num_examples: 4933 download_size: 433519 dataset_size: 2442938 - config_name: subset_1984 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2548665 num_examples: 5101 download_size: 449883 dataset_size: 2548665 - config_name: subset_1985 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 2968620 num_examples: 6247 download_size: 567367 dataset_size: 2968620 - config_name: subset_1986 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3264866 num_examples: 7076 download_size: 596752 dataset_size: 3264866 - config_name: subset_1987 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3624486 num_examples: 7722 download_size: 677504 dataset_size: 3624486 - config_name: subset_1988 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3757236 num_examples: 8741 download_size: 686791 dataset_size: 3757236 - config_name: subset_1989 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3549160 num_examples: 7961 download_size: 641539 dataset_size: 3549160 - config_name: subset_1990 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4080286 num_examples: 9735 download_size: 705163 dataset_size: 4080286 - config_name: subset_1991 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 5007383 num_examples: 11599 download_size: 897578 dataset_size: 5007383 - config_name: subset_1992 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4717907 num_examples: 10565 download_size: 1091000 dataset_size: 4717907 - config_name: subset_1993 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4176604 num_examples: 9274 download_size: 938150 dataset_size: 4176604 - config_name: subset_1994 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4595367 num_examples: 10177 download_size: 1027817 dataset_size: 4595367 - config_name: subset_1995 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4233938 num_examples: 9417 download_size: 961467 dataset_size: 4233938 - config_name: subset_1996 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 8295192 num_examples: 18274 download_size: 1360513 dataset_size: 8295192 - config_name: subset_1997 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3502120 num_examples: 7432 download_size: 692666 dataset_size: 3502120 - config_name: subset_1998 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 7461269 num_examples: 16365 download_size: 1420879 dataset_size: 7461269 - config_name: subset_1999 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 8490147 num_examples: 18473 download_size: 1482647 dataset_size: 8490147 - config_name: subset_2000 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4906475 num_examples: 10537 download_size: 1109040 dataset_size: 4906475 - config_name: subset_2001 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 4728867 num_examples: 10097 download_size: 1064204 dataset_size: 4728867 - config_name: subset_2002 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 3238608 num_examples: 7163 download_size: 723074 dataset_size: 3238608 - config_name: subset_2003 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 8077505 num_examples: 18319 download_size: 1589318 dataset_size: 8077505 - config_name: subset_2004 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 9928020 num_examples: 21210 download_size: 1976352 dataset_size: 9928020 - config_name: subset_2005 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 9604924 num_examples: 20182 download_size: 1933897 dataset_size: 9604924 - config_name: subset_2006 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 12119194 num_examples: 25403 download_size: 2350890 dataset_size: 12119194 - config_name: subset_2007 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 13855094 num_examples: 28325 download_size: 2646104 dataset_size: 13855094 - config_name: subset_2008 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 13073542 num_examples: 25972 download_size: 2488286 dataset_size: 13073542 - config_name: subset_2009 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 21295325 num_examples: 42172 download_size: 3935718 dataset_size: 21295325 - config_name: subset_2010 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 24862822 num_examples: 47593 download_size: 4625114 dataset_size: 24862822 - config_name: subset_2011 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 24543887 num_examples: 46663 download_size: 4541784 dataset_size: 24543887 - config_name: subset_2012 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 22003412 num_examples: 42031 download_size: 4148097 dataset_size: 22003412 - config_name: subset_2013 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 19215182 num_examples: 37150 download_size: 3617787 dataset_size: 19215182 - config_name: subset_2014 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 17804686 num_examples: 34392 download_size: 3317788 dataset_size: 17804686 - config_name: subset_2015 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 16031162 num_examples: 31451 download_size: 3079481 dataset_size: 16031162 - config_name: subset_2016 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 16192602 num_examples: 31485 download_size: 3050143 dataset_size: 16192602 - config_name: subset_2017 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 15634436 num_examples: 30502 download_size: 2968209 dataset_size: 15634436 - config_name: subset_2018 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 13053086 num_examples: 26039 download_size: 2627728 dataset_size: 13053086 - config_name: subset_2019 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 13326104 num_examples: 27188 download_size: 2753786 dataset_size: 13326104 - config_name: subset_2020 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 15110444 num_examples: 30471 download_size: 3069934 dataset_size: 15110444 - config_name: subset_2021 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 15679744 num_examples: 31356 download_size: 2985298 dataset_size: 15679744 - config_name: subset_2022 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 15808618 num_examples: 31663 download_size: 3129519 dataset_size: 15808618 - config_name: subset_2023 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 17383356 num_examples: 35303 download_size: 3405665 dataset_size: 17383356 - config_name: subset_2024 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string splits: - name: train num_bytes: 18636934 num_examples: 38864 download_size: 3760771 dataset_size: 18636934 - config_name: subset_2025 features: - name: 'no' dtype: string - name: doctitle dtype: string - name: bookNo dtype: string - name: section dtype: string - name: category dtype: string - name: publishDate dtype: string - name: pageNo dtype: string - name: id dtype: string - name: pdf_file dtype: string - name: text dtype: string splits: - name: train num_bytes: 73817047 num_examples: 48833 download_size: 15604903 dataset_size: 73817047 configs: - config_name: subset_1885 data_files: - split: train path: subset_1885/train-* - config_name: subset_1886 data_files: - split: train path: subset_1886/train-* - config_name: subset_1887 data_files: - split: train path: subset_1887/train-* - config_name: subset_1888 data_files: - split: train path: subset_1888/train-* - config_name: subset_1889 data_files: - split: train path: subset_1889/train-* - config_name: subset_1890 data_files: - split: train path: subset_1890/train-* - config_name: subset_1891 data_files: - split: train path: subset_1891/train-* - config_name: subset_1892 data_files: - split: train path: subset_1892/train-* - config_name: subset_1893 data_files: - split: train path: subset_1893/train-* - config_name: subset_1894 data_files: - split: train path: subset_1894/train-* - config_name: subset_1895 data_files: - split: train path: subset_1895/train-* - config_name: subset_1896 data_files: - split: train path: subset_1896/train-* - config_name: subset_1897 data_files: - split: train path: subset_1897/train-* - config_name: subset_1898 data_files: - split: train path: subset_1898/train-* - config_name: subset_1899 data_files: - split: train path: subset_1899/train-* - config_name: subset_1900 data_files: - split: train path: subset_1900/train-* - config_name: subset_1901 data_files: - split: train path: subset_1901/train-* - config_name: subset_1902 data_files: - split: train path: subset_1902/train-* - config_name: subset_1903 data_files: - split: train path: subset_1903/train-* - config_name: subset_1904 data_files: - split: train path: subset_1904/train-* - config_name: subset_1905 data_files: - split: train path: subset_1905/train-* - config_name: subset_1906 data_files: - split: train path: subset_1906/train-* - config_name: subset_1907 data_files: - split: train path: subset_1907/train-* - config_name: subset_1908 data_files: - split: train path: subset_1908/train-* - config_name: subset_1909 data_files: - split: train path: subset_1909/train-* - config_name: subset_1910 data_files: - split: train path: subset_1910/train-* - config_name: subset_1911 data_files: - split: train path: subset_1911/train-* - config_name: subset_1912 data_files: - split: train path: subset_1912/train-* - config_name: subset_1913 data_files: - split: train path: subset_1913/train-* - config_name: subset_1914 data_files: - split: train path: subset_1914/train-* - config_name: subset_1915 data_files: - split: train path: subset_1915/train-* - config_name: subset_1916 data_files: - split: train path: subset_1916/train-* - config_name: subset_1917 data_files: - split: train path: subset_1917/train-* - config_name: subset_1918 data_files: - split: train path: subset_1918/train-* - config_name: subset_1919 data_files: - split: train path: subset_1919/train-* - config_name: subset_1920 data_files: - split: train path: subset_1920/train-* - config_name: subset_1921 data_files: - split: train path: subset_1921/train-* - config_name: subset_1922 data_files: - split: train path: subset_1922/train-* - config_name: subset_1923 data_files: - split: train path: subset_1923/train-* - config_name: subset_1924 data_files: - split: train path: subset_1924/train-* - config_name: subset_1925 data_files: - split: train path: subset_1925/train-* - config_name: subset_1926 data_files: - split: train path: subset_1926/train-* - config_name: subset_1927 data_files: - split: train path: subset_1927/train-* - config_name: subset_1928 data_files: - split: train path: subset_1928/train-* - config_name: subset_1929 data_files: - split: train path: subset_1929/train-* - config_name: subset_1930 data_files: - split: train path: subset_1930/train-* - config_name: subset_1931 data_files: - split: train path: subset_1931/train-* - config_name: subset_1932 data_files: - split: train path: subset_1932/train-* - config_name: subset_1933 data_files: - split: train path: subset_1933/train-* - config_name: subset_1934 data_files: - split: train path: subset_1934/train-* - config_name: subset_1935 data_files: - split: train path: subset_1935/train-* - config_name: subset_1936 data_files: - split: train path: subset_1936/train-* - config_name: subset_1937 data_files: - split: train path: subset_1937/train-* - config_name: subset_1938 data_files: - split: train path: subset_1938/train-* - config_name: subset_1939 data_files: - split: train path: subset_1939/train-* - config_name: subset_1940 data_files: - split: train path: subset_1940/train-* - config_name: subset_1941 data_files: - split: train path: subset_1941/train-* - config_name: subset_1942 data_files: - split: train path: subset_1942/train-* - config_name: subset_1943 data_files: - split: train path: subset_1943/train-* - config_name: subset_1944 data_files: - split: train path: subset_1944/train-* - config_name: subset_1945 data_files: - split: train path: subset_1945/train-* - config_name: subset_1946 data_files: - split: train path: subset_1946/train-* - config_name: subset_1947 data_files: - split: train path: subset_1947/train-* - config_name: subset_1948 data_files: - split: train path: subset_1948/train-* - config_name: subset_1949 data_files: - split: train path: subset_1949/train-* - config_name: subset_1950 data_files: - split: train path: subset_1950/train-* - config_name: subset_1951 data_files: - split: train path: subset_1951/train-* - config_name: subset_1952 data_files: - split: train path: subset_1952/train-* - config_name: subset_1953 data_files: - split: train path: subset_1953/train-* - config_name: subset_1954 data_files: - split: train path: subset_1954/train-* - config_name: subset_1955 data_files: - split: train path: subset_1955/train-* - config_name: subset_1956 data_files: - split: train path: subset_1956/train-* - config_name: subset_1957 data_files: - split: train path: subset_1957/train-* - config_name: subset_1958 data_files: - split: train path: subset_1958/train-* - config_name: subset_1959 data_files: - split: train path: subset_1959/train-* - config_name: subset_1960 data_files: - split: train path: subset_1960/train-* - config_name: subset_1961 data_files: - split: train path: subset_1961/train-* - config_name: subset_1962 data_files: - split: train path: subset_1962/train-* - config_name: subset_1963 data_files: - split: train path: subset_1963/train-* - config_name: subset_1964 data_files: - split: train path: subset_1964/train-* - config_name: subset_1965 data_files: - split: train path: subset_1965/train-* - config_name: subset_1966 data_files: - split: train path: subset_1966/train-* - config_name: subset_1967 data_files: - split: train path: subset_1967/train-* - config_name: subset_1968 data_files: - split: train path: subset_1968/train-* - config_name: subset_1969 data_files: - split: train path: subset_1969/train-* - config_name: subset_1970 data_files: - split: train path: subset_1970/train-* - config_name: subset_1971 data_files: - split: train path: subset_1971/train-* - config_name: subset_1972 data_files: - split: train path: subset_1972/train-* - config_name: subset_1973 data_files: - split: train path: subset_1973/train-* - config_name: subset_1974 data_files: - split: train path: subset_1974/train-* - config_name: subset_1975 data_files: - split: train path: subset_1975/train-* - config_name: subset_1976 data_files: - split: train path: subset_1976/train-* - config_name: subset_1977 data_files: - split: train path: subset_1977/train-* - config_name: subset_1978 data_files: - split: train path: subset_1978/train-* - config_name: subset_1979 data_files: - split: train path: subset_1979/train-* - config_name: subset_1980 data_files: - split: train path: subset_1980/train-* - config_name: subset_1981 data_files: - split: train path: subset_1981/train-* - config_name: subset_1982 data_files: - split: train path: subset_1982/train-* - config_name: subset_1983 data_files: - split: train path: subset_1983/train-* - config_name: subset_1984 data_files: - split: train path: subset_1984/train-* - config_name: subset_1985 data_files: - split: train path: subset_1985/train-* - config_name: subset_1986 data_files: - split: train path: subset_1986/train-* - config_name: subset_1987 data_files: - split: train path: subset_1987/train-* - config_name: subset_1988 data_files: - split: train path: subset_1988/train-* - config_name: subset_1989 data_files: - split: train path: subset_1989/train-* - config_name: subset_1990 data_files: - split: train path: subset_1990/train-* - config_name: subset_1991 data_files: - split: train path: subset_1991/train-* - config_name: subset_1992 data_files: - split: train path: subset_1992/train-* - config_name: subset_1993 data_files: - split: train path: subset_1993/train-* - config_name: subset_1994 data_files: - split: train path: subset_1994/train-* - config_name: subset_1995 data_files: - split: train path: subset_1995/train-* - config_name: subset_1996 data_files: - split: train path: subset_1996/train-* - config_name: subset_1997 data_files: - split: train path: subset_1997/train-* - config_name: subset_1998 data_files: - split: train path: subset_1998/train-* - config_name: subset_1999 data_files: - split: train path: subset_1999/train-* - config_name: subset_2000 data_files: - split: train path: subset_2000/train-* - config_name: subset_2001 data_files: - split: train path: subset_2001/train-* - config_name: subset_2002 data_files: - split: train path: subset_2002/train-* - config_name: subset_2003 data_files: - split: train path: subset_2003/train-* - config_name: subset_2004 data_files: - split: train path: subset_2004/train-* - config_name: subset_2005 data_files: - split: train path: subset_2005/train-* - config_name: subset_2006 data_files: - split: train path: subset_2006/train-* - config_name: subset_2007 data_files: - split: train path: subset_2007/train-* - config_name: subset_2008 data_files: - split: train path: subset_2008/train-* - config_name: subset_2009 data_files: - split: train path: subset_2009/train-* - config_name: subset_2010 data_files: - split: train path: subset_2010/train-* - config_name: subset_2011 data_files: - split: train path: subset_2011/train-* - config_name: subset_2012 data_files: - split: train path: subset_2012/train-* - config_name: subset_2013 data_files: - split: train path: subset_2013/train-* - config_name: subset_2014 data_files: - split: train path: subset_2014/train-* - config_name: subset_2015 data_files: - split: train path: subset_2015/train-* - config_name: subset_2016 data_files: - split: train path: subset_2016/train-* - config_name: subset_2017 data_files: - split: train path: subset_2017/train-* - config_name: subset_2018 data_files: - split: train path: subset_2018/train-* - config_name: subset_2019 data_files: - split: train path: subset_2019/train-* - config_name: subset_2020 data_files: - split: train path: subset_2020/train-* - config_name: subset_2021 data_files: - split: train path: subset_2021/train-* - config_name: subset_2022 data_files: - split: train path: subset_2022/train-* - config_name: subset_2023 data_files: - split: train path: subset_2023/train-* - config_name: subset_2024 data_files: - split: train path: subset_2024/train-* - config_name: subset_2025 data_files: - split: train path: subset_2025/train-* --- # Royal Gazette Thailand (Ratchakitcha) Dataset **ชุดข้อมูลราชกิจจานุเบกษา (แบบ Machine Readable)** โครงการ **Open Law Data Thailand** ร่วมกับคณะกรรมาธิการการพาณิชย์และการอุตสาหกรรม วุฒิสภา ได้รับความอนุเคราะห์ข้อมูลจาก **สำนักเลขาธิการคณะรัฐมนตรี (สลค.)** เพื่อเผยแพร่ข้อมูลกฎหมายไทยสู่สาธารณะในรูปแบบที่ประมวลผลได้ด้วยคอมพิวเตอร์ (Machine Readable) เพื่อส่งเสริมนวัตกรรม Legal Tech และ AI ของประเทศไทย ## Dataset Description ชุดข้อมูลนี้รวบรวมรายการประกาศในราชกิจจานุเบกษา ประกอบด้วยชื่อเรื่อง เล่ม ตอน วันที่ประกาศ และลิงก์ไปยังต้นฉบับ PDF เหมาะสำหรับการทำ RAG (Retrieval-Augmented Generation), การสืบค้นกฎหมาย, และการวิเคราะห์ข้อมูลภาครัฐ - **Source:** สำนักเลขาธิการคณะรัฐมนตรี (The Secretariat of the Cabinet) - **Official Collaboration Reference:** หนังสือด่วนที่สุด ที่ นร ๐๕๐๓/๘๗๓๙ (29 ก.ค. 2568) - **Homepage:** [Open Law Data Thailand](https://www.openlawdatathailand.org/) - **Original:** [Open Law Data Thailand](https://huggingface.co/datasets/open-law-data-thailand/soc-ratchakitcha) ## Usage Instruction ท่านสามารถเลือกดาวน์โหลดข้อมูลได้หลายรูปแบบผ่าน Library `datasets` ของ Hugging Face โดยระบุชื่อ `name` ในพารามิเตอร์ (Config) ### 1. สำหรับงาน AI / NLP (แนะนำ) ⭐ หากต้องการข้อความ (Text) เพื่อนำไปเทรนโมเดล หรือทำ Search Engine ท่านสามารถเลือกโหลดข้อมูลแยกเป็น **"รายทศวรรษ"** (Decade Subsets) ได้ ซึ่งจะได้ทั้งไฟล์ OCR และ Metadata ควบคู่กัน ```python from datasets import load_dataset # ตัวอย่าง: โหลดข้อมูลปี 2025 (พ.ศ. 2568) # จะได้ทั้ง Text (OCR) และ Metadata ds = load_dataset("JonusNattapong/Ratchakitcha", name="subset_2025") print(ds['train'][0]) ```` **รายชื่อ Subset ที่รองรับ:** * `subset_2025`, `subset_2020s` (ปัจจุบัน) * `subset_2010s`, `subset_2000s`, `subset_1990s`, ... จนถึง `subset_1960s` * `subset_pre_1960` (ข้อมูลประวัติศาสตร์ก่อนปี 1960/2503) ### 2\. สำหรับการวิเคราะห์ข้อมูล (Metadata Only) หากต้องการวิเคราะห์สถิติ เช่น จำนวนกฎหมายในแต่ละปี หรือค้นหาชื่อเรื่อง โดยไม่ต้องการเนื้อหา Text ```python # โหลดเฉพาะ Metadata ทั้งหมด (ไฟล์เล็ก โหลดเร็ว) ds_meta = load_dataset("JonusNattapong/Ratchakitcha", name="meta") ``` ### 3\. เลือกปีเฉพาะ หากต้องการโหลดเฉพาะปีหนึ่งๆ ท่านสามารถโหลด subset ทศวรรษแล้ว filter ตาม `publishDate` ```python from datasets import load_dataset year = "2025" # เลือกปีที่ต้องการ (เช่น "2025") # เลือก subset ตามทศวรรษ decade = str(int(year) // 10 * 10) + 's' subset_name = f'subset_{decade}' ds = load_dataset("JonusNattapong/Ratchakitcha", name=subset_name) # filter ตามปี ds_filtered = ds.filter(lambda x: x['publishDate'].startswith(year)) print(ds_filtered['train'][0]) ``` **หมายเหตุ:** หากปีที่เลือกไม่อยู่ใน subset ที่มี (เช่น ปีก่อน 1960) ให้ใช้ `subset_pre_1960` และปรับเงื่อนไข filter ตาม ## Data Fields | Field Name | Description (TH) | Description (EN) | | :--- | :--- | :--- | | `no` | ลำดับที่เอกสาร | Document ID / Number | | `doctitle` | ชื่อเรื่องหรือหัวข้อของเอกสาร | Title or topic of the document | | `bookNo` | เล่มของราชกิจจานุเบกษา | Book number | | `section` | ตอนของราชกิจจานุเบกษา | Section number | | `category` | ประเภท (เช่น ก, ข, ง) | Category (e.g., A, B, D) | | `publishDate` | วันที่ประกาศในราชกิจจานุเบกษา | Publication date | | `pageNo` | หมายเลขหน้า | Page number | | `pdf_file` | ชื่อไฟล์ PDF ต้นฉบับ | Filename of the source PDF | ## Legal & License ข้อมูลนี้ได้รับการสนับสนุนจาก **สำนักเลขาธิการคณะรัฐมนตรี** ตามหนังสือตอบข้อหารือ "ด่วนที่สุด ที่ นร ๐๕๐๓/๘๗๓๙" ลงวันที่ 29 กรกฎาคม 2568 เพื่อประโยชน์สาธารณะและการพัฒนาเทคโนโลยีปัญญาประดิษฐ์ (AI) **Disclaimer:** ข้อมูลนี้จัดทำขึ้นเพื่อความสะดวกในการเข้าถึงและวิเคราะห์ข้อมูลเท่านั้น การอ้างอิงทางกฎหมายอย่างเป็นทางการควรตรวจสอบกับต้นฉบับ PDF จากเว็บไซต์ [ratchakitcha.soc.go.th](https://ratchakitcha.soc.go.th/) โดยตรง ## Contact - **Project:** Open Law Data Thailand - **Website:** https://www.openlawdatathailand.org/
提供机构:
JonusNattapong
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作