Deep learning based Missing Data Imputation

Mendeley Data2024-03-05 更新2024-06-27 收录

下载链接：

https://www.doi.org/10.57760/sciencedb.16599

下载链接

链接失效反馈

官方服务：

资源简介：

The code provided is related to training an autoencoder, evaluating its performance, and using it for imputing missing values in a dataset. Let's break down each part:Training the Autoencoder (train_autoencoder function):This function takes an autoencoder model and the input features as input.It trains the autoencoder using the input features as both input and target output (hence features, features).The autoencoder is trained for a specified number of epochs (epochs) with a given batch size (batch_size).The shuffle=True argument ensures that the data is shuffled before each epoch to prevent the model from memorizing the input order.After training, it returns the trained autoencoder model and the training history.Evaluating the Autoencoder (evaluate_autoencoder function):This function takes a trained autoencoder model and the input features as input.It uses the trained autoencoder to predict the reconstructed features from the input features.It calculates Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) scores between the original and reconstructed features.These metrics provide insights into how well the autoencoder is able to reconstruct the input features.Imputing with the Autoencoder (impute_with_autoencoder function):This function takes a trained autoencoder model and the input features as input.It identifies missing values (e.g., -9999) in the input features.For each row with missing values, it predicts the missing values using the trained autoencoder.It replaces the missing values with the predicted values.The imputed features are returned as output.To reuse this code:Load your dataset and preprocess it as necessary.Build an autoencoder model using the build_autoencoder function.Train the autoencoder using the train_autoencoder function with your input features.Evaluate the performance of the autoencoder using the evaluate_autoencoder function.If your dataset contains missing values, use the impute_with_autoencoder function to impute them with the trained autoencoder.Use the trained autoencoder for any other relevant tasks, such as feature extraction or anomaly detection.

所提供的代码围绕自编码器（autoencoder）的训练、性能评估，以及利用其完成数据集缺失值填充三大核心功能展开。下文将对各模块进行详细拆解说明： ### 自编码器训练（train_autoencoder函数）该函数以自编码器模型与输入特征作为输入参数。自编码器的训练逻辑为以输入特征同时作为模型输入与目标输出（即输入为features，目标输出亦为features）。训练过程将按照指定的训练轮次（epochs）与批次大小（batch_size）执行，其中shuffle=True参数会确保每轮训练前对数据进行洗牌，以避免模型记忆输入数据的固有顺序。训练完成后，函数将返回训练就绪的自编码器模型与训练历史记录。 ### 自编码器性能评估（evaluate_autoencoder函数）该函数接收训练完成的自编码器模型与输入特征作为输入。首先通过训练好的自编码器对输入特征进行重构预测，随后计算原始特征与重构特征之间的均方误差（Mean Squared Error, MSE）、平均绝对误差（Mean Absolute Error, MAE）以及决定系数（R-squared, R²）。上述量化指标可用于直观反映自编码器对输入特征的重构精度。 ### 自编码器缺失值填充（impute_with_autoencoder函数）该函数以训练完成的自编码器模型与输入特征作为输入。首先识别输入特征中的缺失值（例如以-9999作为缺失标记的样本），针对每一处存在缺失值的样本行，利用训练好的自编码器预测其缺失项，并将预测结果替换原缺失值，最终返回完成缺失值填充后的特征数据集。 ### 代码复用标准流程 1. 加载目标数据集并完成必要的预处理操作； 2. 通过build_autoencoder函数构建自编码器模型； 3. 调用train_autoencoder函数，传入输入特征完成自编码器的训练； 4. 利用evaluate_autoencoder函数评估训练完成的自编码器性能； 5. 若数据集存在缺失值，可调用impute_with_autoencoder函数借助训练好的自编码器完成缺失值填充； 6. 可将训练好的自编码器应用于其他相关任务，例如特征提取或异常检测。

创建时间：

2024-03-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集