ONE DATA Data Sience Workflows

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4633703

下载链接

链接失效反馈

官方服务：

资源简介：

The ONE DATA data science workflow dataset ODDS-full comprises 815 unique workflows in temporally ordered versions. A version of a workflow describes its evolution over time, so whenever a workflow is altered meaningfully, a new version of this respective workflow is persisted. Overall, 16035 versions are available. The ODDS-full workflows represent machine learning workflows expressed as node-heterogeneous DAGs with 156 different node types. These node types represent various kinds of processing steps of a general machine learning workflow and are grouped into 5 categories, which are listed below. Load Processors for loading or generating data (e.g. via a random number generator). Save Processors for persisting data (possible in various data formats, via external connections or as a contained result within the ONE DATA platform) or for providing data to other places as a service. Transformation Processors for altering and adapting data. This includes e.g. database-like operations such as renaming columns or joining tables as well as fully fledged dataset queries. Quantitative Methods Various aggregation or correlation analysis, bucketing, and simple forecasting. Advanced Methods Advanced machine learning algorithms such as BNN or Linear Regression. Also includes special meta processors that for example allow the execution of external workflows within the original workflow. Any metadata beyond the structure and node types of a workflow has been removed for anonymization purposes ODDS, a filtered variant, which enforces weak connectedness and only contains workflows with at least 5 different versions and 5 nodes, is available as the default version for supervised and unsupvervised learning. Workflows are served as JSON node-link graphs via networkx. They can be loaded into python as follows: import pandas as pd import networkx as nx import json with open('ODDS.json', 'r') as f: graphs = pd.Series(list(map(nx.node_link_graph, json.load(f)['graphs'])))

创建时间：

2021-09-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集