five

Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis

收藏
Figshare2016-09-22 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Classifying_Crystal_Structures_of_Binary_Compounds_AB_through_Cluster_Resolution_Feature_Selection_and_Support_Vector_Machine_Analysis/3792804
下载链接
链接失效反馈
官方服务:
资源简介:
Partial least-squares discriminant analysis (PLS-DA) and support vector machine (SVM) techniques were applied to develop a crystal structure predictor for binary AB compounds. Models were trained and validated on the basis of the classification of 706 AB compounds adopting the seven most common structure types (CsCl, NaCl, ZnS, CuAu, TlI, β-FeB, and NiAs), through data extracted from Pearson’s Crystal Data and ASM Alloy Phase Diagram Database. Out of 56 initial variables (descriptors based on elemental properties only), 31 were selected in as unbiased manner as possible through a procedure of forward selection and backward elimination, with the quality of the model evaluated by measuring the cluster resolution at each step. PLS-DA gave sensitivity of 96.5%, specificity of 66.0%, and accuracy of 77.1% for the validation set data, whereas SVM gave sensitivity of 94.2%, specificity of 92.7%, and accuracy of 93.2%, a significant improvement. Radii, electronegativity, and valence electrons, previously chosen intuitively in structure maps, were confirmed as important variables. PLS-DA and SVM could also make quantitative predictions of hypothetical compounds, unlike semiclassical approaches. The new compound RhCd was predicted to have the CsCl-type structure by PLS-DA (0.669 probability) and, at an even stronger confidence level, by SVM (0.918 probability). RhCd was synthesized by reaction of the elements at 800 °C and confirmed by X-ray diffraction to adopt the CsCl-type structure. SVM is thus a superior classification method in crystallography that is fast and makes correct, quantitative predictions; it may be more broadly applicable to help identify the structure of unknown compounds with any arbitrary composition.

本研究采用偏最小二乘判别分析(PLS-DA)与支持向量机(SVM)技术,针对二元AB化合物构建了晶体结构预测器。模型基于706种采用七种最常见结构类型(氯化铯CsCl、氯化钠NaCl、硫化锌ZnS、铜金CuAu、碘化铊TlI、β-FeB、砷化镍NiAs)的AB化合物的分类任务进行训练与验证,所用数据提取自皮尔逊晶体数据库(Pearson’s Crystal Data)与ASM合金相图数据库(ASM Alloy Phase Diagram Database)。初始共有56个变量(仅基于元素性质构建的描述符),通过前向选择与后向消除的流程,以尽可能无偏的方式筛选出31个变量;建模过程中通过每一步的簇分辨率指标评估模型质量。验证集上,PLS-DA的灵敏度为96.5%、特异度为66.0%、准确率为77.1%;而SVM的灵敏度为94.2%、特异度为92.7%、准确率达93.2%,性能提升显著。此前在结构映射中凭直觉选取的原子半径、电负性与价电子数,经本研究证实为关键输入变量。与半经典方法不同,PLS-DA与SVM还可对假想化合物开展定量预测。研究预测新型化合物RhCd(铑镉)具有氯化铯型结构:PLS-DA给出的预测概率为0.669,而SVM以更高置信度给出了0.918的预测概率。团队通过单质在800℃下的反应合成了RhCd,并经X射线衍射(X-ray diffraction)证实其确实为氯化铯型晶体结构。综上,SVM是晶体学领域中性能优异的分类方法,兼具运算速度快、预测准确定量的优势,可更广泛地用于识别任意组分未知化合物的晶体结构。
创建时间:
2016-09-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作