five

Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis

收藏
NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Classifying_Crystal_Structures_of_Binary_Compounds_AB_through_Cluster_Resolution_Feature_Selection_and_Support_Vector_Machine_Analysis/3792804
下载链接
链接失效反馈
官方服务:
资源简介:
Partial least-squares discriminant analysis (PLS-DA) and support vector machine (SVM) techniques were applied to develop a crystal structure predictor for binary AB compounds. Models were trained and validated on the basis of the classification of 706 AB compounds adopting the seven most common structure types (CsCl, NaCl, ZnS, CuAu, TlI, β-FeB, and NiAs), through data extracted from Pearson’s Crystal Data and ASM Alloy Phase Diagram Database. Out of 56 initial variables (descriptors based on elemental properties only), 31 were selected in as unbiased manner as possible through a procedure of forward selection and backward elimination, with the quality of the model evaluated by measuring the cluster resolution at each step. PLS-DA gave sensitivity of 96.5%, specificity of 66.0%, and accuracy of 77.1% for the validation set data, whereas SVM gave sensitivity of 94.2%, specificity of 92.7%, and accuracy of 93.2%, a significant improvement. Radii, electronegativity, and valence electrons, previously chosen intuitively in structure maps, were confirmed as important variables. PLS-DA and SVM could also make quantitative predictions of hypothetical compounds, unlike semiclassical approaches. The new compound RhCd was predicted to have the CsCl-type structure by PLS-DA (0.669 probability) and, at an even stronger confidence level, by SVM (0.918 probability). RhCd was synthesized by reaction of the elements at 800 °C and confirmed by X-ray diffraction to adopt the CsCl-type structure. SVM is thus a superior classification method in crystallography that is fast and makes correct, quantitative predictions; it may be more broadly applicable to help identify the structure of unknown compounds with any arbitrary composition.
创建时间:
2016-09-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作