Analysis of SAI reports adherence to ISSAI3000 standards with LLMs and Chain-of-Thought prompting

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14848783

下载链接

链接失效反馈

官方服务：

资源简介：

Analysis of SAI reports adherence to ISSAI3000 standards with LLMs and Chain-of-Thought prompting This dataset provides replication data for a study that introduces an automated large language model (LLM)-based framework for evaluating how Supreme Audit Institutions (SAIs) across 33 countries document their adherence to ISSAI 3000 performance audit standards. We obtained the official ISSAI 3000 standards by downloading “ISSAI‐3000‐Performance‐Audit‐Standard.pdf” from the INTOSAI website (https://www.issai.org) and converting it to plain text. This document, which introduces 42 separate standards, provides the fundamental framework used by Supreme Audit Institutions (SAIs) to conduct performance audits. While the document highlights 42 distinct requirements, we assigned each requirement a number corresponding to the specific paragraph in the original text, leading to 42 uniquely numbered standards. We visited each of the 33 countries’ SAI websites to collect five performance audit reports in PDF format, adhering to the following criteria: · We prioritized the most recent reports available. · We avoided routine or annual analyses (e.g., yearly budget reviews) and instead targeted standalone audits focusing on specific public entities. · If the website featured a distinct performance audit section, we drew our sample from there. · A human researcher made the selections using Google Chrome’s automatic translation to English, ensuring relevant content. A total of 165 PDFs were retrieved and converted to text. However, due to GPT4o‐mini’s 128K token limit, 18 files exceeded the model’s capacity, leaving 147 usable reports. A comprehensive list of all countries, their SAIs, and the corresponding audits appears in Appendix 2 in prompts_and_additional_data.pdf. Using the INTOSAI‐Donor Cooperation project database (https://intosaidonor.org/project-database/), we recorded whether any of these countries served as donors in INTOSAI or regional SAI projects, the number of projects funded, and the total financial contributions. These details are also provided in Appendix 3 in prompts_and_additional_data.docx. Finally, we drew on the World Bank’s Worldwide Governance Indicators for five key governance‐related measures: Control of Corruption, Government Effectiveness, Regulatory Quality, Rule of Law, and Voice and Accountability (https://databank.worldbank.org/source/worldwide-governance-indicators). We derived a composite Governance Quality index by averaging these five variables, with final values reported in Appendix 4 in prompts_and_additional_data.docx. Additionally, a list of all audited institutions included in the analyzed SAI reports is provided in Appendix 5, while English-language summaries of all SAI reports are available in Appendix 6 of the prompts_and_additional_data.docx. There are three files uploaded: 1. SAI_reports_and_ISSAI3000_standards.zip 2. sai_analysis_results.pkl (Python pkl format) 3. prompts_and_additional_data.pdf Ad 1. This zipped file contains 34 folders: one for each of the 33 countries covered in the study and one containing PDF and text files describing the ISSAI 3000 standards. Each country folder includes ten files: five PDF files with SAI reports and five corresponding text files, converted from the PDFs. The original file names, as downloaded from national SAI websites, have been retained. Ad 2. File containing the results of analyzing the adherence of 147 SAI reports to ISSAI 3000 standards. The data is structured in the following Python dictionary format: results = { "reports": reports, "report_summary": report_summary, "report_rating": report_rating, "skipped_reports": skipped_reports } Where: - "reports" refers to the list of analyzed reports. - "report_summary" contains the English-language summaries of SAI reports (also provided in Appendix 6 of prompts_and_additional_data.docx). - "report_rating" consists of a list of Python dictionaries containing the report rating results, as described in user_prompt_4 in Appendix 1 of prompts_and_additional_data.docx and replicated below in Python format. {\n\ 'country': 'Country name',\n\ 'date': 'Date of the report',\n\ 'language': 'Language of the report',\n\ 'audited_institution': 'Name(s) of the institution(s) audited by SAI',\n\ 'standards': {\n\ 'standard21': {\n\ 'assessment': 'Assessment - Fully, Almost fully, Partially, or Does not meet',\n\ 'comment': 'One-sentence comment about meeting this standard in the report (leave empty if 'Fully')'\n\ },\n\ ...\n\ 'standard139': {\n\ 'assessment': 'Assessment - Fully, Almost fully, Partially, or Does not meet',\n\ 'comment': 'One-sentence comment about meeting this standard in the report (leave empty if 'Fully')'\n\ }\n\ }\n\ }\n\n\ - “skipped_reports” is a list of reports that were too long to fit the LLM context window. Ad 3. The prompts_and_additional_data.pdf file contains multiple appendices. Appendix 1 includes the system prompt and four user prompts used in the Chain-of-Thought (CoT) prompting of GPT4o-mini for analyzing SAI reports. Other appendices provide additional data used in this study.

创建时间：

2025-02-11