Accelerating Interoperability With Databricks Lakehouse

Name: Accelerating Interoperability With Databricks Lakehouse
Creator: Databricks
License: 暂无描述

Databricks2024-05-09 收录

下载链接：

https://marketplace.databricks.com/details/aa7c7506-f11a-45a8-8b3d-7b1798c6ef8a/Databricks_Accelerating-Interoperability-With-Databricks-Lakehouse

下载链接

链接失效反馈

官方服务：

资源简介：

https://www.databricks.com/solutions/accelerators/fhir In this solution accelerator, we demonstrate how we can leverage the lakehouse approach, for an in-depth analysis of patient outcomes, using EHR data. Consider a scenario that we have a collection of FHIR bundles and want to explore the effect of different factors on Covid outcomes. However, FHIR standard is primarily designed for the exchange of information and not optimized for analytics. To solve this problem, we need to flatten the the bundles (stored as nested json files) and extract resources such as patients, encounters, conditions etc. so that we can create a dataset which is ready for exploratory data analysis. We can decompose this process in 3 main steps: * **Data ingestion** - Simplify ingestion, from all kind of sources. As example, we'll use Databricks Labs dbignite library to ingest FHIR bundle as tables ready to be queried in SQL in one line. - Query and explore the data ingested - Optionally we can secure data access * **Exploratory Analysis/Data Curation** - Create cohorts - Create a patient level data structure (a patient dashboard) from the bundles - Investigate rate of hospital admissions among covid patients and explore correlations among different factors such as SDOH, disease history and hospital admission * **Data Science / Advanced Analytics** - Create patient features - Create a training dataset to build a model predicting and analysing our cohort - Use SHAP for explaining the effect of different features on the outcome under study Click on the "Get instant access" button in the top right corner to clone the solution accelerator repo into your workspace. Once the repo is cloned into your workspace, please execute the **RUNME** notebook in the repo in order to create the cluster and job you can use to run the notebooks.

提供机构：

Databricks

5,000+

优质数据集

54 个

任务类型

进入经典数据集