Benchmark dataset of experimental results for single-pass stream-based active learning query algorithms
收藏DataCite Commons2026-04-14 更新2026-05-07 收录
下载链接:
https://redu.unicamp.br/citation?persistentId=doi:10.25824/redu/RGPRFD
下载链接
链接失效反馈官方服务:
资源简介:
1. Overview This dataset contains the aggregated and structured results of a large-scale benchmark evaluating twelve single-pass stream-based active learning query strategies. This is the experimental results dataset for the master's dissertation: "A Quantitative and Comparative Analysis of Single-Pass Stream-Based Active Learning Query Algorithms". The experiments span: 82 datasets 5 machine learning models 12 stream-based query strategies 5 labeling budgets: 5%, 10%, 20%, 50%, and 100% 20,000+ experimental runs Each row represents a single experimental configuration, defined by: (dataset, model, hyperparameters, query strategy, labeling budget) This file is designed for statistical analysis, ranking, and comparative evaluation of strategies under constrained labeling scenarios. 2. File Structure Granularity: One row per experimental run Primary metric: Final model accuracy Evaluation setting: Single-pass stream-based active learning 3. Column Dictionary Below is the semantic definition of each column in the dataset. dataset Type: String Description: Dataset used in the experiment. Scope: 82 unique datasets. Purpose: Enables cross-dataset robustness analysis. model_name Type: String Description: Machine learning algorithm used. Scope: 5 model families. Purpose: Allows studying model–strategy interaction. model_params Type: String (serialized dictionary) Description: Hyperparameters used for the model. Example: {'C': 0.01} Recommendation: Parse into dictionary for reproducibility or hyperparameter grouping. query_strategy Type: String Description: Active learning strategy used in the stream. Scope: 12 strategies. Purpose: Main variable of interest for comparative evaluation. budget Type: Float Values: 0.05 0.10 0.20 0.50 1.00 Description: Fraction of instances allowed to be labeled. Interpretation: Controls labeling cost. initial_score Type: Float Description: Baseline performance before applying active learning. Purpose: Reference point for measuring improvement. percentage_queried Type: Float Description: Actual fraction of instances labeled. Note: May slightly differ from the defined budget due to stream dynamics. Reflects real labeling consumption. final_accuracy Type: Float Description: Final model performance after active learning. Metric: Classification accuracy. Primary evaluation metric. 4. Summary experiment_results.csv is a large-scale benchmark dataset for evaluating stream-based active learning strategies under varying labeling budgets. It supports: Cross-dataset comparisons Strategy ranking Budget sensitivity analysis Model–strategy interaction studies Efficiency and robustness evaluation The structure is analysis-ready and designed for statistical benchmarking and research publication purposes.
提供机构:
Repositório de Dados de Pesquisa da Unicamp
创建时间:
2026-02-24



