five

Delta Sharing Egress Pipeline

收藏
Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/a6f2e062-3084-4976-9eb0-47b2c8244d43/Databricks_Delta-Sharing-Egress-Pipeline
下载链接
链接失效反馈
官方服务:
资源简介:
**Overview** This listing contains all notebooks necessary to monitor egress usage patterns associated with Delta Sharing. Examine cloud provider storage access logs to automatically compute bytes transferred, remote IP addresses, requested file paths, and recipient identifiers. The notebooks are attached to this listing's share. Additionally, they are included in the preview for your convenience. **Use cases** - Compute egress costs for a specific share. - Perform additional breakdowns to learn about individual customer usage patterns. - Learn which files are the most popular. **Additional Details** The logs are joined with cloud provider IP range tables and Delta Sharing table, share, and recipient information system tables to generate egress bytes transferred, attributed by share and recipient. These notebooks are meant to be run as a Delta Live Table (DLT) template which will automatically generate a detailed cost report. **How to Use** 1. [Enable S3 Server Access Logging](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html) on your delta sharing bucket. 2. Import the attached notebooks into your workspace. 3. Set up and execute the pipeline to generate a denormalized IP address -> cloud region mapping table. use the `IpRanges` notebook. 4. Set up and execute the pipeline to do cloud provider storage access logs analysis. Follow the instructions in the `S3AccessLogDlt` notebook. **Please note** that you must specify the output table from executing the IpRanges notebook as the input for this pipeline. **Requirements** - DLT enabled - You will be responsible for DLT pipeline costs - Available on: `AWS` - (AWS): S3 server access logging enabled on the shared bucket. - Metastore Admin or `CREATE EXTERNAL LOCATION` permission is required to mount the storage access logs to your Unity Catalog **Limitations** 1. Only attributes costs for managed tables, but not other data assets such as external tables, volumes, models. 2. In some edge cases, the pipeline may not automatically associate recipient names, region, and/or cloud. **Output** Below is a preview of the egress report output table schema. - recipient_id - recipient_name - table_catalog - table_schema - table_name - url_pattern - shares_list - request_date - service - region - sum_bytes_sent
提供机构:
Databricks
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作