five

EZClust: A Robust Machine Learning-Based Powder X‑Ray Diffraction and Raman Cluster Analysis Model for Efficient High-Throughput Crystallization Polymorph Screening

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/EZClust_A_Robust_Machine_Learning-Based_Powder_X_Ray_Diffraction_and_Raman_Cluster_Analysis_Model_for_Efficient_High-Throughput_Crystallization_Polymorph_Screening/31077237
下载链接
链接失效反馈
官方服务:
资源简介:
High-throughput crystallization (HTC) polymorph screening is pivotal for exploring the crystal polymorph landscape, but the sheer volume and complexity of powder X-ray diffraction (PXRD) and Raman spectroscopy data present significant data-processing challenges. Traditional approaches, which rely on human interpretation aided by software, are often constrained by limited clustering accuracy. To address these limitations, we developed EZClust, a lightweight machine-learning model designed for rapid PXRD and Raman batch data analysis. A key algorithm in the model is shape-based distance (SBD), which provides robust performance for processing data with distortion and minimal parameter tuning. In this work, we compare EZClust’s performance to existing mainstream commercial software (Jade Pro) and the open-source AutoFIDEL implementation, demonstrating its robustness through cluster analysis of HTC datasets for the model compounds ROY and carbamazepine. Herein, we disclose the core algorithms of EZClust, robust preprocessing coupled with an SBD metric, to streamline cluster analysis for PXRD and Raman datasets in HTC workflows.

高通量结晶(High-throughput crystallization, HTC)多晶型筛选对于探索晶体多晶型图谱至关重要,但粉末X射线衍射(Powder X-ray diffraction, PXRD)与拉曼光谱(Raman spectroscopy)数据的体量庞大且复杂度高,带来了显著的数据处理挑战。传统方法依赖软件辅助人工解析,往往受限于聚类精度不足的问题。为解决上述局限,我们开发了EZClust——一款专为快速开展粉末X射线衍射与拉曼光谱批量数据分析设计的轻量级机器学习模型。该模型的核心算法为基于形状的距离(Shape-based distance, SBD),其在处理存在信号畸变的数据时表现出优异的鲁棒性,且仅需极少的参数调优。本研究将EZClust的性能与现有主流商用软件Jade Pro及开源AutoFIDEL实现进行对比,并通过对模型化合物ROY与卡马西平(carbamazepine)的高通量结晶数据集开展聚类分析,验证了该模型的鲁棒性。本文公开了EZClust的核心算法:兼具鲁棒性的预处理流程结合基于形状的距离度量,以优化高通量结晶工作流中粉末X射线衍射与拉曼光谱数据集的聚类分析流程。
创建时间:
2026-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作