five

cfahlgren1/medicaid-provider-spending

收藏
Hugging Face2026-02-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/cfahlgren1/medicaid-provider-spending
下载链接
链接失效反馈
官方服务:
资源简介:
--- tags: - healthcare - medicaid - cms - npi - provider-spending size_categories: - 100M<n<1B configs: - config_name: default data_files: - split: spending path: medicaid-provider-spending.parquet - split: billing_providers path: billing-providers.parquet - split: servicing_providers path: servicing-providers.parquet - split: hcpcs_codes path: hcpcs-codes.parquet --- # Medicaid Provider Spending This dataset contains provider-level Medicaid spending data aggregated from outpatient and professional claims with valid HCPCS codes, covering January 2018 through December 2024. It provides insights into how Medicaid dollars are distributed across providers and procedures nationwide. Provider details (name, address, taxonomy) are sourced from the [NPPES NPI Registry](https://npiregistry.cms.hhs.gov/) (February 2026 dissemination). ## Data Description | Attribute | Value | |-----------|-------| | Time Period | January 2018 - December 2024 | | Granularity | Provider (NPI) x HCPCS Code x Month | | Geographic Scope | National (all states and territories) | | Coverage | Fee-for-service, managed care, and CHIP | This dataset aggregates individual claims to the provider-procedure-month level, providing counts of beneficiaries served, claims submitted, and total amounts paid by Medicaid. ## Splits | Split | Rows | Description | |-------|------|-------------| | `spending` | 227,083,361 | Claim-level spending by billing/servicing NPI, HCPCS code, and month | | `billing_providers` | 617,503 | Distinct billing provider NPIs with name, address, taxonomy | | `servicing_providers` | 1,627,362 | Distinct servicing provider NPIs with name, address, taxonomy | | `hcpcs_codes` | 7,549 | HCPCS Level II code descriptions (A-V prefix codes) | ## Schema ### `spending` | Column | Type | Description | |--------|------|-------------| | `BILLING_PROVIDER_NPI_NUM` | string | NPI of the billing provider | | `SERVICING_PROVIDER_NPI_NUM` | string | NPI of the servicing provider | | `HCPCS_CODE` | string | Healthcare Common Procedure Coding System code | | `CLAIM_FROM_MONTH` | string | Claim month (YYYY-MM) | | `TOTAL_UNIQUE_BENEFICIARIES` | int64 | Number of unique Medicaid beneficiaries | | `TOTAL_CLAIMS` | int64 | Total number of claims | | `TOTAL_PAID` | float64 | Total amount paid ($) | ### `billing_providers` / `servicing_providers` | Column | Type | Description | |--------|------|-------------| | `npi` | string | National Provider Identifier | | `entity_type` | int64 | 1 = Individual, 2 = Organization | | `org_name` | string | Organization name (entity_type = 2) | | `last_name` | string | Provider last name (entity_type = 1) | | `first_name` | string | Provider first name (entity_type = 1) | | `middle_name` | string | Provider middle name | | `credential` | string | Provider credential (MD, DO, etc.) | | `address_line1` | string | Practice location address | | `city` | string | Practice location city | | `state` | string | Practice location state | | `zip` | string | Practice location ZIP code | | `phone` | string | Practice location phone | | `sex` | string | Provider sex (individuals only) | | `taxonomy_code` | string | Primary healthcare provider taxonomy code | | `enumeration_date` | date | Date the NPI was assigned | ### `hcpcs_codes` | Column | Type | Description | |--------|------|-------------| | `hcpcs_code` | string | HCPCS Level II code | | `description` | string | Short description of the procedure/service | > **Note**: This split contains HCPCS Level II codes only (alpha-prefixed: A-V). CPT codes (5-digit numeric) used in the spending data are not included as they are separately licensed by the AMA. ## Joining Join provider details onto spending data by NPI: ```sql SELECT s.*, b.org_name AS billing_org, b.city AS billing_city, b.state AS billing_state FROM spending s LEFT JOIN billing_providers b ON s.BILLING_PROVIDER_NPI_NUM = b.npi; ``` ## Loading the Data ```python from datasets import load_dataset ds = load_dataset("cfahlgren1/medicaid-provider-spending") spending = ds["spending"] billing = ds["billing_providers"] servicing = ds["servicing_providers"] ``` ## Use Cases - **Provider spending analysis**: Identify top Medicaid providers by total spending or volume - **Procedure utilization trends**: Track how utilization of specific procedures changes over time - **Geographic comparisons**: Compare provider spending patterns across states - **Outlier detection**: Identify unusual billing patterns for further investigation - **Policy research**: Analyze the impact of policy changes on Medicaid spending ## About T-MSIS The Transformed Medicaid Statistical Information System (T-MSIS) is CMS's comprehensive data system for collecting Medicaid and CHIP data from all 50 states, the District of Columbia, and US territories. T-MSIS data is submitted monthly by states to CMS and includes information on beneficiary enrollment and eligibility, fee-for-service claims, managed care encounter data, and provider information. ## Cell Suppression Methodology To protect beneficiary privacy, this dataset applies cell suppression: - **Threshold**: Rows with fewer than 12 total claims are dropped entirely - **Purpose**: Prevents re-identification of individuals who received uncommon procedures or visited low-volume providers This means the dataset represents the majority of Medicaid spending but excludes low-volume provider-procedure combinations. ## Data Accuracy This data is derived from T-MSIS submissions and is only as accurate as the data submitted by each state. State Medicaid agencies should be considered the authoritative source for all provider and claims data. T-MSIS has known data quality issues that vary by state and data element. For detailed information on data quality concerns, refer to CMS's [DQ Atlas](https://www.medicaid.gov/dq-atlas/welcome). ## Sources - **Spending data**: [CMS Medicaid Provider Spending](https://data.cms.gov/) - **Provider data**: [NPPES NPI Data Dissemination](https://download.cms.gov/nppes/NPI_Files.html) (February 2026) - **HCPCS codes**: [NLM Clinical Tables API](https://clinicaltables.nlm.nih.gov/apidoc/hcpcs/v3/doc.html)
提供机构:
cfahlgren1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作