five

ONE DATA Data Sience Workflows

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4633703
下载链接
链接失效反馈
官方服务:
资源简介:
The ONE DATA data science workflow dataset ODDS-full comprises 815 unique workflows in temporally ordered versions.  A version of a workflow describes its evolution over time, so whenever a workflow is altered meaningfully, a new version of this respective workflow is persisted.  Overall, 16035 versions are available.  The ODDS-full workflows represent machine learning workflows expressed as node-heterogeneous DAGs with 156 different node types. These node types represent various kinds of processing steps of a general machine learning workflow and are grouped into 5 categories, which are listed below. Load Processors for loading or generating data (e.g. via a random number generator). Save Processors for persisting data (possible in various data formats, via external connections or as a contained result within the ONE DATA platform) or for providing data to other places as a service. Transformation Processors for altering and adapting data. This includes e.g. database-like operations such as renaming columns or joining tables as well as fully fledged dataset queries. Quantitative Methods Various aggregation or correlation analysis, bucketing, and simple forecasting. Advanced Methods Advanced machine learning algorithms such as BNN or Linear Regression. Also includes special meta processors that for example allow the execution of external workflows within the original workflow. Any metadata beyond the structure and node types of a workflow has been removed for anonymization purposes ODDS, a filtered variant, which enforces weak connectedness and only contains workflows with at least 5 different versions and 5 nodes, is available as the default version for supervised and unsupvervised learning.   Workflows are served as JSON node-link graphs via networkx. They can be loaded into python as follows: import pandas as pd import networkx as nx import json with open('ODDS.json', 'r') as f: graphs = pd.Series(list(map(nx.node_link_graph, json.load(f)['graphs'])))
创建时间:
2021-09-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作