five

Leveraging Machine Learning for Thermoelectric Material Design: Addressing Composition–Property Relations and Data Imbalance Challenges

收藏
Figshare2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Leveraging_Machine_Learning_for_Thermoelectric_Material_Design_Addressing_Composition_Property_Relations_and_Data_Imbalance_Challenges/30519047
下载链接
链接失效反馈
官方服务:
资源简介:
Thermoelectric (TE) technology has emerged as a promising and sustainable solution to address the growing global energy demand. While machine learning accelerates the discovery of high-performance thermoelectric materials, its effectiveness is frequently hampered by data imbalance and quality issues. This study addresses these challenges by utilizing a highly imbalanced data set comprising Germanium Telluride (GeTe) materials, including both pure GeTe and its doped or alloyed variants. A classification model was developed based on four key descriptors, such as temperature, Seebeck coefficient, electronegativity, and electron affinity, to categorize samples into low, medium, and high figure of merit (ZT) classes. To mitigate the effects of class imbalance, an ensemble learning approach was combined with the Adaptive Synthetic Sampling (ADASYN) oversampling technique. Among the models evaluated, the XGBoost classifier demonstrated superior performance, achieving a macro-average precision of 0.94, recall of 0.95, F1-score of 0.94, and an overall accuracy of 94%, making it the most effective model for identifying high-performance TE materials under imbalanced conditions. The XGBoost regression model performed well with an R2 of 0.97 and an RMSE of 0.07, allowing for effective screening of materials with high ZT values. To improve model interpretability, SHAP (SHapley Additive exPlanations) analysis was conducted, which revealed that temperature is the most significant factor for predicting the figure of merit. This work provides a solid and interpretable framework for accelerating the discovery of thermoelectric materials for next-generation energy conversion technologies.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作