five

Large Language Model Predicting the Corrosion Inhibition Efficiency Based on Text Embedding for Small Dataset

收藏
Zenodo2026-05-15 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20193233
下载链接
链接失效反馈
官方服务:
资源简介:
Update v1.1: This version represents a major update to the experimental framework, ensuring full reproducibility of the manuscript's results. Key updates include the integration of pre-computed LLM embeddings, hybrid GNN model evaluation, and comprehensive performance comparison scripts. LLM-Corrosion-Inhibition Official repository for the paper: "Large Language Model Predicting the Corrosion Inhibition Efficiency Based on Text Embedding for Small Dataset" This project provides a framework for predicting corrosion inhibition efficiency using text embeddings from Large Language Models (LLMs) combined with gradient boosting and graph-based models. Performance Comparison Our proposed LLM-based approach (Group E) demonstrates superior performance compared to traditional machine learning and deep learning models: Model R2 MAE RMSE Pearson Rho Proposed (LLM-based E) 0.5341 49.0238 62.0296 0.7625 GNN-based (Hybrid) 0.41 55.0438 66.0307 0.6999 XGBoost (Baseline A) 0.3229 65.4353 74.7848 0.5725 RF (Optimized) 0.3153 64.0658 75.1997 0.5642 SVM (Optimized) 0.2027 68.5637 81.1471 0.4940 MLP (Optimized) 0.2703 68.9599 77.6349 0.5425 Getting Started 1. Environment Setup We recommend using a virtual environment (Python >= 3.8.2):   Bash   # Create and activate virtual environment python -m venv venv source venv/bin/activate # or for Windows: venv\Scripts\activate # Install dependencies pip install --upgrade pip pip install -r requirements.txt   2. Usage To run the proposed LLM-based model (Group E):   Bash   python E.py   3. File Structure Data Files: ze41_combined.csv: Raw experimental dataset. E-embedding.csv: Text-embeddings (1024-dim) generated via LLM API. Code Experiments: A.py, B.py, C.py, D.py, E.py: Progressive experimental scripts. gnn.py: Hybrid GNN (GCN-like + MLP) evaluation. RFE.py: Feature importance analysis using 10-fold cross-validation. tml.py: Comparison with traditional ML models (SVM, RF, MLP).   Documentation For detailed logs and parameters, please refer to the included files: save.md: Full terminal outputs and logs for all scripts. Xgboost.md: Detailed record of all XGBoost hyperparameters. reame.md: this files. Important Notes API Access: In E.py, the API_KEY is set to "YOUR_API_KEY". The script automatically loads pre-computed vectors from E-embedding.csv to ensure reproducibility without an active API. Hardware: gnn.py utilizes CUDA by default and falls back to CPU if no compatible GPU is detected. Citations and Acknowledgements Data Source If you use the raw dataset (ze41_combined.csv), please cite: [1] https://doi.org/10.1038/s41529-023-00391-0 This Research If you find this code helpful, please cite our publication: (Insert your publication citation here)
提供机构:
Zenodo
创建时间:
2026-05-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作