Labrea
收藏Labrea 数据集框架
概述
Labrea 是一个用于声明性、函数式数据集定义的框架。
安装
Labrea 可以通过 pip 安装: bash pip install labrea
或者从 GitHub 安装最新开发版本: bash pip install git+https://github.com/8451/labrea@develop
使用
Labrea 提供了一个 dataset 装饰器,允许以声明性方式定义数据集及其依赖项。依赖项可以是其他数据集或 Option,这些值可以在运行时通过字典传递。
示例代码
python from labrea import dataset, Option import pandas as pd
@dataset def stores(path: str = Option(PATHS.STORES)) -> pd.DataFrame: return pd.read_csv(path)
@dataset def transactions(path: str = Option(PATHS.SALES)) -> pd.DataFrame: return pd.read_csv(path)
@dataset def sales_by_region( stores_: pd.DataFrame = stores, transactions_: pd.DataFrame = transactions ) -> pd.DataFrame: """Merge stores to transactions, sum sales by region""" return pd.merge(transactions_, stores_, on=store_id).groupby(region)[sales].sum().reset_index()
options = { PATHS: { STORES: path/to/stores.csv, SALES: path/to/sales.csv } }
stores(options)
+-----------------+-----------+
| store_id | region |
|-----------------+-----------|
| 1 | North |
| 2 | North |
| 3 | South |
| 4 | South |
+-----------------+-----------+
transactions(options)
+-----------------+-----------------+-----------------+
| store_id | sales | transaction_id |
|-----------------+-----------------+-----------------|
| 1 | 100 | 1 |
| 2 | 200 | 2 |
| 3 | 300 | 3 |
| 4 | 400 | 4 |
+-----------------+-----------------+-----------------+
sales_by_region(options)
+-----------------+-----------------+
| region | sales |
|-----------------+-----------------|
| North | 300 |
| South | 700 |
+-----------------+-----------------+




