钙钛矿
收藏阿里云天池2026-05-27 更新2024-03-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/139768
下载链接
链接失效反馈官方服务:
资源简介:
一个简单的教程,向您展示如何在K 超级计算机双钙钛矿数据集上执行机器学习,其中包含使用精确但计算成本高的方法(具有自旋轨道相互作用的混合泛函)计算的带隙。本教程继 2019 年 5 月 20 日至 5 月 24 日期间在加州州立大学洛杉矶分校的一次讲座之后。从化学式开始,我们使用正则表达式来提取构成化合物的原子,并生成原子特征。然后,我们将这些原子特征输入随机森林算法,对其进行训练,并演示如何完成超参数调整。
要遵循的主要代码采用 Jupyter Notebook ( .ipynb) 格式。.gitignore除了(此文件只是告诉 Git 忽略 Jupyter 生成的自动保存文件;您可以忽略它而不会产生任何后果)之外,所有其他文件都将在笔记本中使用和解释。如果您想跟随,请在完成part_1_process_data.ipynb之前完成part_2_random_forest.ipynb。
重要的!这些笔记本中的代码在讲座期间实时运行。超参数调整非常小;当然可以做得比这更好。此外,还有其他机器学习算法的性能优于随机森林,例如 XGBoost。但是,我保持简单,以便观众熟悉 scikit-learn,并在进行现场演示时快速运行。
A hands-on tutorial demonstrating how to perform machine learning on the K Supercomputer double perovskite dataset, which contains band gaps calculated using an accurate but computationally expensive method: hybrid functionals with spin-orbit interaction. This tutorial follows a lecture delivered at California State University, Los Angeles, between May 20 and May 24, 2019. Starting from chemical formulas, we use regular expressions to extract constituent atoms of the compounds and generate atomic features. We then feed these atomic features into a Random Forest algorithm, train the model, and demonstrate how to complete hyperparameter tuning.
The primary code to follow is provided in Jupyter Notebook (.ipynb) format. All files except .gitignore (this file simply tells Git to ignore auto-saved files generated by Jupyter; you can safely disregard it with no adverse consequences) will be used and explained in the notebooks. If you wish to follow along, please complete part_2_random_forest.ipynb before finishing part_1_process_data.ipynb.
Important! The code in these notebooks was run live during the lecture. The hyperparameter tuning shown is quite minimal; better performance can certainly be achieved. Additionally, there are other machine learning algorithms that outperform Random Forest, such as XGBoost. However, the tutorial keeps things simple to familiarize the audience with scikit-learn and enable fast execution during the live demo.
提供机构:
阿里云天池
创建时间:
2022-10-25
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于机器学习教程的双钙钛矿数据集,主要包含通过精确计算方法获得的带隙数据。数据集规模较小,仅包括两个CSV文件(test.csv和train1.csv),旨在演示从化学式提取原子特征到使用随机森林算法进行训练和超参数调整的完整流程,适用于教育和学习目的。
以上内容由遇见数据集搜集并总结生成



