Congressional Witnesses Matter: Proving Witness Testimony Impact Using Large Language Models
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14290999
下载链接
链接失效反馈官方服务:
资源简介:
This repository provides the data supporting our study, in which we use a large language model (LLMs) to analyze the impact of congressional witness testimony. The dataset has been curated and structured to facilitate reproducibility and encourage further research in this domain.
The repository includes the results of our study (see `Results.zip`), the fine-tuning corpus (see `Model Training Data.zip`), and the Witness and Legislative History and Impact Corpus (WLHIC), which is can be subdivided into the Witness Corpus (WC) and Legislative History and Impact Corpus (LHIC). For the LHIC and WC, we provide cleaned JSONL files containing the full datasets, individual text files of each document, and accompanying metadata (see `WLHIC data.zip`). To ensure comprehensive accessibility, we also include the original PDF versions of the documents in these corpora (see `WLHIC Raw Files.zip`).
We also provide the sentence transformer model resulting from the extended pretraining process and the model resulting from the fine tuning process. Both are accessible in `Models.zip`.
Researchers can use the provided data to replicate our findings and verify the results of our analysis. The cleaned data can also be regenerated by applying the cleaning scripts provided in the code repository to the LHIC and WC text files. While slight variations in results may occur when replicating the study from scratch due to the stochastic nature of LLM training, these differences are minimal and do not affect the substantive findings.
We encourage the use of this dataset for reproducibility studies and to inspire further exploration of LLM applications in political science research. By publishing this data, we aim to promote transparency, collaboration, and innovation in the field.
创建时间:
2024-12-06



