Data Trust and Stats Intelligence - DTSi
收藏Databricks2025-10-25 收录
下载链接:
https://marketplace.databricks.com/details/f0eb5585-6d21-4dc8-b0b1-04c354d33222/Astreya-Partners-LLC_Data-Trust-and-Stats-Intelligence---DTSi
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
Data Trust and Stats Intelligence (DTSi) is an **AI-powered accelerator** that helps to derive impactful statistical insights, enabling faster and more trusted data-driven decisions from raw datasets.
It goes beyond simple data quality checks by applying **15+ advanced statistical and AI techniques**, including anomaly detection, predictive modeling, correlation mapping, and hypothesis testing.
**Goal:** Help organizations ensure **data integrity, trust, and readiness** so their data can power AI models and informed business decisions.
**Key Use Cases**
**IT Operations**-Catch anomalies early, optimize reliability, and reduce downtime with real-time health checks and AI-driven recommendations.
**Features:** **Anomaly Detection**, **AI Recommendation Engine**, **Data Health Overview**
**Business & Customer Analytics**-Uncover hidden patterns, track performance drivers, and improve engagement with clear statistical insights.
**Features:** **Correlation Matrices**, **Distribution Analysis**, **Descriptive Statistics**
**Finance**-Flag fraud, validate data, and forecast performance with automated modeling and anomaly detection.
**Features:** **Anomaly Detection**, **Predictive Modeling**, **Automated Data Validation**
**Education**-Identify learning gaps, measure progress, and predict trends with easy-to-use dashboards.
**Features:** **Confidence Intervals & Testing**, **Dispersion Analysis**, **Executive Dashboard**
**Key Product Features**
* **Data Statistical Report:** Highlights key statistical discoveries with graphical representations
* **Data Health Overview:** Continuous integrity checks
* **AI Recommendation Engine:** Next-step data-driven guidance
* **Automated Data Validation:** Instant schema detection and validation
* **Descriptive Statistics:** Mean, median, mode, and frequency distributions
* **Dispersion Analysis:** Variance, standard deviation, and quartiles
* **Confidence Intervals & Testing:** Rigorous hypothesis validation
* **Correlation Matrices:** Maps relationships across data fields
* **Anomaly Detection:** Z-scores, outlier, and unusual pattern detection
* **Distribution Analysis:** Histograms and optimized binning for trend clarity
**Quick Start Guide**
**1. Prepare Your Dataset**
* Dataset to be in **Excel (.xlsx)** tabular format
* Ensure there is a sheet named **“data”**
* First row always contains **column headers**
* **No merged cells**
**2. Setup Working Folder**
* Place **dtsi_Setup.ipynb** and **dtsi_Core.ipynb** in the same folder
* Run **dtsi_Setup.ipynb** → creates folders and files as below:
* **Input**
* **Output**
* **Resources/HTML Template**
* **Config.json**
**3. Run Data Analysis**
* Place the file to be analyzed in the **Input folder**
* Open **dtsi_Core.ipynb** and **execute all cells**
* Choose:
* **Full Analysis:** Data Awareness + statistical analysis
* **Statistical Analysis:** Use a pre-validated file from Input
Ensure workbook has sheets: **data**, **metadata**, and **schema** (output of Data Awareness).
When you run **Full Analysis**, there will be a file to be reviewed created under **folder output/review**.
Please review the file before continuing the execution of statistical analysis.
**4. View Report**
* A **ZIP file** with Input data, Data awareness data, insights JSON, and HTML will appear in **Output** → download and extract
* Open **terminal** in the extracted folder → run:
```bash
python -m http.server
```
* Go to **localhost:8000** in your browser → open **index.html** to view the report
**FAQs / Troubleshooting Guide**
**1. Metadata Sheet Issues**
**Problem:** Metadata sheet fields (**title, description**) are empty
**Solution:**
* Manually fill in missing titles and descriptions
* Check all metadata entries
* Ensure descriptions are meaningful
* Verify data types in the schema sheet
**2. DTSi HTML Report Not Loading**
**Problem:** DTSi HTML report opens but shows no data; charts are blank; JavaScript errors
**Possible Causes:** Incorrect HTML indentation or broken JSON data
**Steps to troubleshoot:**
**Quick Fix**
* Check browser console for errors
* Try a different browser (**Firefox**, **Safari**)
**Data & Structure Fix**
* Validate **HTML indentation**
* Check **JSON formatting**, especially `const data`
**Additional Checks**
* Ensure **JavaScript libraries** are loaded
* Test with a smaller dataset
* Clear browser cache and reload
**3. Large File Processing Timeouts**
**Problem:** Processing stuck or system unresponsive
**Troubleshooting Steps:**
* Split dataset into **smaller chunks**
* Remove unnecessary columns or use **sampled data**
* Process during **off-peak hours**
* Use **incremental processing** or **aggregate data**
**4. Memory/Performance Issues**
**Problem:** Browser crashes or calculations incomplete during analysis
**Troubleshooting Steps:**
* Close other tabs and clear cache
* Use browser with **higher memory limits**
* Process only required statistical tests or columns in batches
* Use **sampling** for exploratory analysis
**System Requirements**
**Browsers:** Chrome, Firefox (Recommended)
**Code Execution Options:**
**Local Environment:** Use Visual Studio Code (Recommended) or Jupyter Notebook.
**Databricks Workspace:** Import the notebook directly into your Databricks Workspace to run and explore interactively.
提供机构:
Astreya Partners LLC



