five

weerayuth/nitibench

收藏
Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/weerayuth/nitibench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - sentence-similarity - text-generation tags: - legal - RAG - LCLM size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: ccl path: data/ccl-* - split: tax path: data/tax-* dataset_info: features: - name: question dtype: string - name: answer dtype: string - name: relevant_laws list: - name: law_name dtype: string - name: section_content dtype: string - name: section_num dtype: string - name: reference_answer dtype: string - name: reference_laws list: - name: law_name dtype: string - name: section_content dtype: string - name: section_num dtype: string splits: - name: ccl num_bytes: 18873695 num_examples: 3729 - name: tax num_bytes: 2227708 num_examples: 50 download_size: 4728201 dataset_size: 21101403 --- # 👩🏻‍⚖️ NitiBench: A Thai Legal Benchmark for RAG **[[📄 Technical Report](https://arxiv.org/pdf/2502.10868)] | [[👨‍💻 Github Repository](https://github.com/vistec-AI/nitibench/)]** This dataset provides the **test** data for evaluating LLM frameworks, such as RAG or LCLM. The benchmark consists of two datasets: - [NitiBench-CCL](#️-nitibench-ccl) - [NitiBench-Tax](#-nitibench-tax) ## 🏛️ NitiBench-CCL Derived from the [WangchanX-Legal-ThaiCCL-RAG Dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG), our version includes an additional preprocessing step in which we separate the reasoning process from the final answer. The dataset contains 35 pieces of legislation related to **C**orporate and **C**ommercial **L**aw (CCL). Information about the 35 pieces of legislation is provided in the table below: | Legislation | Legal Terminology | Training | Test | |-------------|-------------------|----------|------| | Organic Act on Counter Corruption, B.E. 2561 | organic law | ✓ | | | Civil and Commercial Code | code | ✓ | ✓ | | Revenue Code | code | ✓ | ✓ | | Accounting Act, B.E. 2543 | act | ✓ | ✓ | | Accounting Profession Act, B.E. 2547 | act | ✓ | ✓ | | Act on Disciplinary Offenses of Government Officials Performing Duties in Agencies Other than Government Agencies, B.E. 2534 | act | ✓ | | | Act on Offences of Officials Working in State Agencies or Organizations, B.E. 2502 | act | ✓ | | | Act on Offences Relating to Registered Partnerships, Limited Partnerships, Companies Limited, Associations and Foundations, B.E. 2499 | act | ✓ | ✓ | | Act on the Establishment of Government Organizations, B.E. 2496 | act | ✓ | | | Act on the Management of Shares and Stocks of Ministers, B.E. 2543 | act | ✓ | | | Act Repealing the Agricultural Futures Trading Act, B.E. 2542 B.E. 2558 | act | ✓ | | | Budget Procedure Act, B.E. 2561 | act | ✓ | | | Business Registration Act, B.E. 2499 | act | ✓ | ✓ | | Chamber of Commerce Act, B.E. 2509 | act | ✓ | ✓ | | Derivatives Act, B.E. 2546 | act | ✓ | ✓ | | Energy Conservation Promotion Act, B.E. 2535 | act | ✓ | ✓ | | Energy Industry Act, B.E. 2550 | act | ✓ | ✓ | | Financial Institutions Business Act, B.E. 2551 | act | ✓ | ✓ | | Fiscal Discipline Act, B.E. 2561 | act | ✓ | | | Foreign Business Act, B.E. 2542 | act | ✓ | ✓ | | Government Procurement and Supplies Management Act, B.E. 2560 | act | ✓ | | | National Economic and Social Development Act, B.E. 2561 | act | ✓ | | | Petroleum Income Tax Act, B.E. 2514 | act | ✓ | ✓ | | Provident Fund Act, B.E. 2530 | act | ✓ | ✓ | | Public Limited Companies Act, B.E. 2535 | act | ✓ | ✓ | | Secured Transactions Act, B.E. 2558 | act | ✓ | ✓ | | Securities and Exchange Act, B.E. 2535 | act | ✓ | ✓ | | State Enterprise Capital Act, B.E. 2542 | act | ✓ | | | State Enterprise Committee and Personnel Qualifications Standards Act, B.E. 2518 | act | ✓ | | | State Enterprise Development and Governance Act, B.E. 2562 | act | ✓ | | | State Enterprise Labor Relations Act, B.E. 2543 | act | ✓ | | | Trade Association Act, B.E. 2509 | act | ✓ | ✓ | | Trust for Transactions in Capital Market Act, B.E. 2550 | act | ✓ | ✓ | | Emergency Decree on Digital Asset Businesses, B.E. 2561 | emergency decree | ✓ | | | Emergency Decree on Special Purpose Juristic Person for Securitization, B.E. 2540 | emergency decree | ✓ | ✓ | The training split of `nitibench-ccl` can be found in the [WangchanX-Legal-ThaiCCL-RAG dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG). ### Data Format Each data point contains four columns: - `question: str` — A question relevant to the `relevant_laws`. - `answer: str` — The original answer generated by an LLM, which has been revised and edited by legal experts to include both the reasoning steps and the final answer. - `relevant_laws: List[Dict[str, str]]` — A list of relevant law name, section, and contents. - `reference_answer: str` — The answer to the question based on the `relevant_laws`, provided without the reasoning steps. - `reference_laws: List[Dict[str, str]]` - A list of referenced law mentioned in `relevant_laws` column. Formally, given the data triple \((q, T=\{p_1, p_2, \dots, p_K\}, y)\), \(q\) represents the `question`, \(T\) represents `relevant_laws`, and \(y\) represents the `answer`. ### Data Curation Using the notation described above, the data was curated as follows: 1. Queries (\(q\)) and answers (\(y\)) were manually crafted by legal experts based on a single section sampled from the legal texts of the 35 pieces of legislation. 2. For each data triple \((q, T, y)\), the manually crafted question was carefully quality-assured by a second legal expert. Thus, for the test data, there is only one positive per query (\(|T|=1\)). The diagram below shows how the test data was collected. ![ccl-test](./assets/CCL_Test_Set.png) ## 💸 NitiBench-Tax This subset provides a question, relevant laws, and an answer for each data point. Instead of having legal experts manually craft the questions, we scraped the data from a reliable source: the [Revenue Department Website](https://rd.go.th). This subset contains Tax Ruling Cases officially provided by the Revenue Department since 2021. As a result, this subset is considerably more challenging, as it requires extensive legal reasoning both for searching for relevant documents and for generating the answer. The data collection procedure is illustrated in the figure below: ![tax-test](./assets/tax_case_test_set.png) ### Data Format This split uses the same format as described in the [NitiBench-CCL split](#data-format). ## Contact For any inquiries or concerns, please reach out to us via email: [Chompakorn Chaksangchaichot](mailto:chompakornc_pro@vistec.ac.th). ## Citation ```bibtex @inproceedings{akarajaradwong-etal-2025-nitibench, title = "{N}iti{B}ench: Benchmarking {LLM} Frameworks on {T}hai Legal Question Answering Capabilities", author = "Akarajaradwong, Pawitsapak and Pothavorn, Pirat and Chaksangchaichot, Chompakorn and Tasawong, Panuthep and Nopparatbundit, Thitiwat and Pratai, Keerakiat and Nutanong, Sarana", editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet", booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2025", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.emnlp-main.1739/", doi = "10.18653/v1/2025.emnlp-main.1739", pages = "34292--34315", ISBN = "979-8-89176-332-6", abstract = "Large language models (LLMs) show promise in legal question answering (QA), yet Thai legal QA systems face challenges due to limited data and complex legal structures. We introduce NitiBench, a novel benchmark featuring two datasets: (1) NitiBench-CCL, covering Thai financial laws, and (2) NitiBench-Tax, containing Thailand{'}s official tax rulings. Our benchmark also consists of specialized evaluation metrics suited for Thai legal QA. We evaluate retrieval-augmented generation (RAG) and long-context LLM (LCLM) approaches across three key dimensions: (1) the benefits of domain-specific techniques like hierarchy-aware chunking and cross-referencing, (2) comparative performance of RAG components, e.g., retrievers and LLMs, and (3) the potential of long-context LLMs to replace traditional RAG systems. Our results reveal that domain-specific components slightly improve over naive methods. At the same time, existing retrieval models still struggle with complex legal queries, and long-context LLMs have limitations in consistent legal reasoning. Our study highlights current limitations in Thai legal NLP and lays a foundation for future research in this emerging domain." } @misc{akarajaradwong2025nitibenchcomprehensivestudiesllm, title={NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering}, author={Pawitsapak Akarajaradwong and Pirat Pothavorn and Chompakorn Chaksangchaichot and Panuthep Tasawong and Thitiwat Nopparatbundit and Sarana Nutanong}, year={2025}, eprint={2502.10868}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.10868}, } ``` ## License This dataset is provided under the **MIT License**. ## Acknowledgment We sincerely appreciate the generous support from the WangchanX program sponsors—PTT, SCB, and SCBX—whose funding made this project possible. We are also grateful for the invaluable collaboration with VISTEC, which was crucial in bringing this project to fruition. <br> <div style="display: flex; align-items: center; gap: 20px;"> Sponsored by <img src="./assets/VISAI_Logo_Horizontal.png" style="height:30px;" alt="VISAI Logo"> <img src="./assets/Logo_vistec.png" style="height:30px;" alt="VISTEC Logo"> </div>
提供机构:
weerayuth
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作