five

Plain-text conversions of U.S. Final Environmental Impact Statements from 2013-2020

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.25338%252FB80K9R
下载链接
链接失效反馈
官方服务:
资源简介:
Administrative procedures are intended to increase transparency and help agencies make better decisions. However, these requirements also increase agency workload. Understanding how public agencies satisfy procedural requirements is a critical facet of agency performance. This analysis focuses on the language agencies use in Environmental Impact Statements (EISs) required by the U.S. National Environmental Policy Act (NEPA) – specifically, the reuse of similar text within and between assessments. We synthesize theories of institutional isomorphism and bureaucratic coping to understand why and how text is reused, and consider the tradeoffs associated with this behavior. Using a national dataset of 1014 EISs published by 22 U.S. agencies from 2013 to 2020, we explore how boilerplate language varies by agency, authors, project type, location, and consulting firm involvement. We find that text reuse primarily occurs where there is a clear substantive rationale for boilerplate language or where studies share authors or contract consulting firms. This indicates: (1) that agencies largely do not merely engage in pro forma compliance efforts; and (2) that while NEPA procedures are oriented around individual projects and decisions, cross-project learning and the narrowness – or breadth – of agencies’ project portfolios shape analytical routines and the relative tradeoffs of boilerplate text in policy analysis. This paper adds to our theoretical understanding of agencies’ coping strategies in response to institutional pressures and makes a methodological contribution by demonstrating the application of text reuse measurement and information extraction methods in public administration research. Methods These data are full-text representations of EIS documents and associated metadata stored in the US EPA’s e-NEPA repository. The e-NEPA page provides a nearly comprehensive record of final EISs (FEISs) FEISs published since October 2012, although a small proportion of files are not posted, and some file links are corrupted or broken. We used web-scraping to record available metadata and download available documents. These raw data include all FEISs obtained from the e-NEPA page. The subsequent analysis concers a subsample of FEISs published between 2013 and 2020 for agencies that completed at least 5 EISs during that time. We exclude Adopted FEISs, which are cases where one agency uses an EIS already prepared by another agency in lieu of a separate analysis, as well as Withdrawn FEISs. There are also several cases where two agencies (often the Bureau of Land Management [BLM] and FS) file EISs referring to the same project, using almost identical documentation. In these cases, we keep the EIS published first. After download, the documents were converted to plain text using the pdftools pckage in R https://cran.r-project.org/web/packages/pdftools/ and stored in data.table format (https://rdatatable.gitlab.io/data.table/) by page. Finally, the aggregated text is stored as an .RDS file for compression (the equivalent csv is ~10GB).
创建时间:
2021-10-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作