ggsmith/usa_spending_awards

Name: ggsmith/usa_spending_awards
Creator: ggsmith
Published: 2026-04-10 03:42:13
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ggsmith/usa_spending_awards

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc0-1.0 language: - en tags: - government size_categories: - 10M<n<100M --- # Dataset Card for USA Spending Awards This dataset contains curated USAspending award records exported from the USAspending bulk download API and published as a Hugging Face dataset. This dataset card is based on the Hugging Face dataset card template and has been adapted to reflect the current dataset structure and known provenance. ## Dataset Details ### Dataset Description This dataset contains U.S. federal award records sourced from USAspending bulk award exports. The current published schema includes award identifiers, recipient information, period of performance dates, award value, awarding and funding agency fields, industry classification codes, product or service codes, and two lineage fields added during curation: `source_file_hash` and `fiscal_year`. - **Curated by:** Grant Smith - **Shared by [optional]:** Grant Smith - **Language(s) (NLP):** English - **License:** cc0-1.0 ### Dataset Sources [optional] - **Repository:** [https://huggingface.co/datasets/ggsmith/usa_spending_awards](https://huggingface.co/datasets/ggsmith/usa_spending_awards) - **Website [optional]:** [https://www.usaspending.gov] ## Uses ### Direct Use This dataset is suitable for: - analysis of federal award activity across agencies and sub-agencies - exploratory analysis of award values, recipients, and procurement categories - building public finance, procurement, or government spending dashboards - downstream tabular ML or analytics workflows after users validate the field semantics against USAspending documentation ### Out-of-Scope Use This dataset should not be treated as: - a complete representation of all USAspending data fields - a substitute for official legal, accounting, compliance, or procurement determinations - a fully normalized or canonical source for entity resolution across recipients or agencies - a dataset with independently validated labels or annotations for supervised learning tasks ## Dataset Structure The dataset is currently exposed on the Hugging Face Hub with a single `train` split. ### Features - `award_id_piid`: The award or procurement instrument identifier reported by USAspending for the transaction. - `recipient_name`: The name of the entity receiving the award funds. - `recipient_duns`: The recipient's DUNS identifier when present in the source data. - `period_of_performance_start_date`: The reported start date for the award's period of performance. - `period_of_performance_current_end_date`: The current reported end date for the award's period of performance. - `transaction_description`: Free-text description of the purchased good, service, or funded activity associated with the award transaction. - `current_total_value_of_award`: The current total awarded value reported for the award at the time of extraction. - `awarding_agency_name`: The top-level federal agency that issued the award. - `awarding_sub_agency_name`: The sub-agency or bureau within the awarding agency that issued the award. - `award_type`: The award type label from USAspending. - `award_or_idv_flag`: Indicator showing whether the record is an award or an IDV-related record. - `funding_agency_name`: The top-level federal agency providing the funding for the award. - `funding_sub_agency_name`: The sub-agency or bureau within the funding agency providing the funds. - `naics_code`: The NAICS industry classification code associated with the award. - `product_or_service_code`: The federal product or service code associated with the award. - `source_file_hash`: Internal lineage field identifying which downloaded USAspending source file the row came from. - `fiscal_year`: Fiscal year label derived during curation. ## Dataset Creation ### Curation Rationale This dataset was created to make selected USAspending award export data easier to access and analyze from the Hugging Face Hub in a structured tabular format. ### Source Data The source data comes from USAspending bulk award exports retrieved through the USAspending API and processed into parquet for publication. #### Data Collection and Processing Based on the project code in the companion pipeline repository, the curation workflow includes: - submitting USAspending bulk award export requests for selected fiscal year date ranges - filtering for `prime_award_types` `A`, `B`, `C`, and `D` - selecting a subset of award-related columns from the source export - downloading the resulting ZIP archive - converting CSV files from the ZIP archive into parquet files - adding lineage fields including `source_file_hash` and `fiscal_year` The selected source columns in the pipeline are: - `award_id_piid` - `recipient_name` - `recipient_duns` - `period_of_performance_start_date` - `period_of_performance_current_end_date` - `transaction_description` - `current_total_value_of_award` - `awarding_agency_name` - `awarding_sub_agency_name` - `award_type` - `award_or_idv_flag` - `funding_agency_name` - `funding_sub_agency_name` - `naics_code` - `product_or_service_code` #### Who are the source data producers? The original source records are produced and published through USAspending. #### Personal and Sensitive Information The dataset is considered public information and not sensitive or personal. ## Bias, Risks, and Limitations Important limitations include: - the dataset contains only a selected subset of USAspending award fields - the current curation workflow filters to specific prime award types - fiscal year coverage depends on which source exports were requested and published - downstream users may misinterpret agency, recipient, or award identifiers without consulting USAspending field definitions - source system errors, missing values, and reporting inconsistencies may be present in the original data - lineage and fiscal year fields are curator-added and should not be confused with official source fields ### Recommendations Users should: - validate field definitions against official USAspending documentation - confirm the precise fiscal year coverage in the published dataset files - avoid using the dataset as the sole basis for legal, procurement, audit, or compliance decisions - document any additional cleaning, normalization, or entity resolution applied downstream ## Glossary [optional] - **USAspending:** U.S. government platform for federal spending and award data. - **NAICS code:** North American Industry Classification System code associated with an award. - **Product or service code:** Federal code used to classify purchased products or services. - **DUNS:** Data Universal Numbering System identifier, when present in the source data. - **IDV:** Indefinite Delivery Vehicle. ## Dataset Card Authors [optional] Grant Smith ## Dataset Card Contact [More Information Needed]

提供机构：

ggsmith

5,000+

优质数据集

54 个

任务类型

进入经典数据集