five

manuscript_clock.R

收藏
DataCite Commons2025-06-01 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/manuscript_clock_R/29107931/1
下载链接
链接失效反馈
官方服务:
资源简介:
EPIGENETIC CLOCK MODELING FOR DIAPAUSE METHYLATION DATA<br>SUMMARYThis script builds an elastic net regression model to predict biological age (in days) based on DNA methylation profiles from a diapause experiment in <i>Nasonia vitripennis.</i> It uses univariate feature pre-selection, elastic net via glmnet, and includes performance evaluation, coefficient extraction, and visualization.<br>INPUT DATA- Erin's methylation data: erindata_v2.txt- Time-related differentially methylated loci (DMLs): un.dmls.timepoint.txt<br>MAIN STEPS1. Load and preprocess methylation data.2. Filter CpGs based on external list of DMLs.3. Optionally pre-select features using univariate Pearson correlation with age.4. Transpose data to a modeling-friendly format (samples as rows).5. Scale CpG features and remove highly correlated ones.6. Create dummy variables for treatment group.7. Fit an elastic net model with repeated cross-validation (caret + glmnet).8. Generate predictions, performance metrics, and plots.9. Extract and save non-zero coefficients from the final model.<br>MODEL DETAILS- Model: Elastic Net (alpha = 0.5)- Tuning: Lambda optimized over exponential grid (log scale)- Cross-validation: 10-fold CV repeated 3 times- Target: Age in days ("day" column)<br>OUTPUT FILES- glmnet_erin_only_uniCorrCpGs_figure.pdf: Predicted vs actual age plot- glmnet_erin_only_uniCorrCpGs_final_model.rds: Trained glmnet model object- glmnet_erin_only_uniCorrCpGs_predictions_data.csv: Prediction + metadata- glmnet_erin_only_uniCorrCpGs_coefficients.csv: Non-zero CpG features<br>EXCEL EXPORTS (commented out in script)- Table_S1_Age_DMPs: CpGs from timepoint DML list- Table_S2_Age_Correlated: Age-correlated CpGs used in modeling- Table_S3_Clock_Coefficients: Non-zero coefficients from final model<br>DEPENDENCIES- R packages: data.table, dplyr, caret, glmnet, ggplot2, openxlsx2, emmeans, vip, Metrics- Parallel backend: doParallel + foreach<br>NOTES- Pre-selection is based on |correlation| &gt; 0.3 and p-value &lt;= 0.05- Age prediction is visualized separately for Control and Diapause samples- Post-hoc emmeans analysis compares predicted age by treatment group across days<br>HOW TO RUNEnsure input files are present and paths are correctly set. Then run the script from R.Parallel processing will use all available cores minus one.<br><br>Contact: Eamonn Mallon, ebm3@le.ac.uk<br>
提供机构:
figshare
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作