abigailhaddad/federal-public-lands-spending
收藏Hugging Face2026-05-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/abigailhaddad/federal-public-lands-spending
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- tabular-classification
tags:
- government
- federal-spending
- usaspending
- public-lands
- contracts
- grants
pretty_name: Federal Public Lands Spending
size_categories:
- 100K<n<1M
---
# Federal Public Lands Spending
Contract and grant transaction data from the [USAspending Award Data Archive](https://www.usaspending.gov/download_center/award_data_archive) for federal agencies that manage public lands.
**Interactive demo:** [](https://colab.research.google.com/github/abigailhaddad/federal-public-lands-contracting/blob/main/demo.ipynb)
**Pipeline code:** [github.com/abigailhaddad/federal-public-lands-contracting](https://github.com/abigailhaddad/federal-public-lands-contracting)
## Agencies covered
**Department of the Interior (014):**
Bureau of Land Management, Bureau of Reclamation, Bureau of Safety and Environmental Enforcement, U.S. Geological Survey, National Park Service, Office of Surface Mining, U.S. Fish and Wildlife Service, Office of the Inspector General
**Department of Agriculture (012):**
Forest Service
## Dataset structure
### Transaction files
4 files per fiscal year, split by department and award type:
| File pattern | Description |
|-------------|-------------|
| `transactions/doi_contracts_fyYYYY.parquet` | Interior Department contract actions |
| `transactions/doi_grants_fyYYYY.parquet` | Interior Department grant awards |
| `transactions/usda_contracts_fyYYYY.parquet` | USDA Forest Service contract actions |
| `transactions/usda_grants_fyYYYY.parquet` | USDA Forest Service grant awards |
**Columns in each file:**
- `federal_action_obligation` — dollar amount of the transaction
- `fiscal_year` — federal fiscal year (October–September)
- `funding_sub_agency_name` — which agency funded it
- `recipient_parent_name` / `recipient_name` — who received the money
- `product_or_service_code_description` — what was purchased (contracts)
- `cfda_title` — assistance program name (grants)
- `county_fips` — recipient county FIPS code (5-digit, zero-padded)
- `recipient_state_code` — recipient state
### Aggregate files
Pre-computed annual obligation totals:
- `aggregates/annual_by_psc.parquet` — by product/service code (contracts)
- `aggregates/annual_by_cfda.parquet` — by assistance listing (grants)
- `aggregates/annual_by_recipient.parquet` — by recipient organization
- `aggregates/annual_by_county.parquet` — by county FIPS code
### Lookup files
Historical statistics for t-test analysis:
- `lookups/historical_lookup.parquet` — wide format: one row per group, columns for each FY, plus hist_mean, hist_sd, hist_se, n_years
- `lookups/historical_annual_detail.parquet` — same data in long format
## How to use
### With DuckDB (no download needed)
```python
import duckdb
HF = https://huggingface.co/datasets/abigailhaddad/federal-public-lands-spending/resolve/main
# Top 10 recipients in FY2023
duckdb.sql(f"""
SELECT recipient_parent_name, SUM(federal_action_obligation) AS total
FROM read_parquet([
'{HF}/transactions/doi_contracts_fy2023.parquet',
'{HF}/transactions/usda_contracts_fy2023.parquet'
])
GROUP BY recipient_parent_name
ORDER BY total DESC
LIMIT 10
""").show()
```
### With pandas
```python
from huggingface_hub import hf_hub_download
import pandas as pd
path = hf_hub_download(
repo_id=abigailhaddad/federal-public-lands-spending,
filename=transactions/doi_contracts_fy2023.parquet,
repo_type=dataset,
)
df = pd.read_parquet(path)
```
### With R
```r
library(arrow)
url <- https://huggingface.co/datasets/abigailhaddad/federal-public-lands-spending/resolve/main/transactions/doi_contracts_fy2023.parquet
download.file(url, doi_contracts_fy2023.parquet, mode = wb)
df <- read_parquet(doi_contracts_fy2023.parquet)
```
## Update schedule
A GitHub Action runs daily and checks if USAspending has published a new monthly snapshot of the Award Data Archive. When a new snapshot is detected, it re-downloads all fiscal years one at a time and uploads any changes. On days when the Archive hasn't changed, nothing happens.
The Archive is typically updated once per month by USAspending.
## Data source
All data comes from the [USAspending Award Data Archive](https://www.usaspending.gov/download_center/award_data_archive) — the same source used in the original manual R-based analysis. Files are downloaded from `https://files.usaspending.gov/award_data_archive/`, filtered to the public-land agencies listed above, deduplicated, and converted to parquet.
license: CC0 1.0 公共领域贡献协议
task_categories:
- 表格分类
tags:
- 政府
- 联邦支出
- USAspending
- 公共土地
- 合同
- 拨款
pretty_name: 联邦公共土地支出
size_categories:
- 10万至100万条数据
---
# 联邦公共土地支出
本数据集包含来自**USAspending奖项数据档案库**(USAspending Award Data Archive,https://www.usaspending.gov/download_center/award_data_archive)的、针对公共土地管理联邦机构的合同与拨款交易数据。
**交互式演示**:[](https://colab.research.google.com/github/abigailhaddad/federal-public-lands-contracting/blob/main/demo.ipynb)
**流水线代码仓库**:[github.com/abigailhaddad/federal-public-lands-contracting](https://github.com/abigailhaddad/federal-public-lands-contracting)
## 覆盖的联邦机构
**内政部(Department of the Interior,编号014)**:
土地管理局、垦务局、安全与环境执法局、美国地质调查局、国家公园管理局、地表采矿办公室、美国鱼类及野生动物管理局、总监察长办公室
**农业部(Department of Agriculture,编号012)**:
林务局
## 数据集结构
### 交易数据文件
每个财年包含4个文件,按部门与奖励类型拆分:
| 文件命名规则 | 描述 |
|-------------|-------------|
| `transactions/doi_contracts_fyYYYY.parquet` | 内政部合同交易数据 |
| `transactions/doi_grants_fyYYYY.parquet` | 内政部拨款奖励数据 |
| `transactions/usda_contracts_fyYYYY.parquet` | 农业部林务局合同交易数据 |
| `transactions/usda_grants_fyYYYY.parquet` | 农业部林务局拨款奖励数据 |
**各文件包含字段**:
- `federal_action_obligation` — 交易义务金额(美元)
- `fiscal_year` — 联邦财年(10月至次年9月)
- `funding_sub_agency_name` — 出资下属机构
- `recipient_parent_name` / `recipient_name` — 资金接收方的母公司名称 / 直接接收方名称
- `product_or_service_code_description` — 采购内容(仅合同类数据)
- `cfda_title` — 援助项目名称(仅拨款类数据)
- `county_fips` — 接收方所在县FIPS(Federal Information Processing Standards)编码(5位,前置补零)
- `recipient_state_code` — 接收方所在州编码
### 聚合统计文件
预计算的年度义务支出总额:
- `aggregates/annual_by_psc.parquet` — 按产品/服务代码分组的年度数据(合同类)
- `aggregates/annual_by_cfda.parquet` — 按援助项目编号分组的年度数据(拨款类)
- `aggregates/annual_by_recipient.parquet` — 按接收方机构分组的年度数据
- `aggregates/annual_by_county.parquet` — 按县FIPS编码分组的年度数据
### 查找表文件
用于t检验分析的历史统计数据:
- `lookups/historical_lookup.parquet` — 宽格式数据:每组占一行,每列对应一个财年,附加字段包括历史均值(hist_mean)、历史标准差(hist_sd)、标准误(hist_se)与统计年数(n_years)
- `lookups/historical_annual_detail.parquet` — 长格式存储的同源数据
## 使用方法
### 使用DuckDB(无需提前下载)
python
import duckdb
HF = https://huggingface.co/datasets/abigailhaddad/federal-public-lands-spending/resolve/main
# 2023财年前十大资金接收方
duckdb.sql(f"""
SELECT recipient_parent_name, SUM(federal_action_obligation) AS total
FROM read_parquet([
'{HF}/transactions/doi_contracts_fy2023.parquet',
'{HF}/transactions/usda_contracts_fy2023.parquet'
])
GROUP BY recipient_parent_name
ORDER BY total DESC
LIMIT 10
""").show()
### 使用Pandas
python
from huggingface_hub import hf_hub_download
import pandas as pd
path = hf_hub_download(
repo_id="abigailhaddad/federal-public-lands-spending",
filename="transactions/doi_contracts_fy2023.parquet",
repo_type="dataset",
)
df = pd.read_parquet(path)
### 使用R语言
r
library(arrow)
url <- "https://huggingface.co/datasets/abigailhaddad/federal-public-lands-spending/resolve/main/transactions/doi_contracts_fy2023.parquet"
download.file(url, "doi_contracts_fy2023.parquet", mode = "wb")
df <- read_parquet("doi_contracts_fy2023.parquet")
## 更新计划
GitHub Action每日运行,检查USAspending是否发布了奖项数据档案库的新月度快照。若检测到新快照,则逐个重新下载所有财年数据并上传变更;若档案库无更新,则不执行任何操作。
USAspending通常每月更新一次档案库。
## 数据来源
所有数据均来自**USAspending奖项数据档案库**(https://www.usaspending.gov/download_center/award_data_archive),与原始手动R语言分析所用数据源一致。数据从`https://files.usaspending.gov/award_data_archive/`下载,筛选出上述公共土地管理机构的相关数据,完成去重后转换为Parquet格式。
提供机构:
abigailhaddad



