CSMILES: A Compact, Human-Readable SMILES Extension for Conformations
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/CSMILES_A_Compact_Human-Readable_SMILES_Extension_for_Conformations/30220325
下载链接
链接失效反馈官方服务:
资源简介:
While line notation schemes for molecular
structure are
well developed,
they are generally unable to distinguish different conformations of
the same molecule. CSMILES, an extension to the ubiquitous line notation
scheme, SMILES, has been developed to address this issue. CSMILES
are short strings of text that encode information characterizing the
conformer structure in the maximally compact form. A conformer is
defined by the dihedral angles associated with a structure that has
a specified connectivity between atoms. The extension is straightforward:
in the simplest case values for the dihedral angles of these bonds
are determined from the atomic coordinates and added within a SMILES
string at the location of the bond. For example, the canonical SMILES
string of pentanol-1 is OCCCCC, and the CSMILES
of one of its conformers is O{299}C{180}C{178}C{70}C{56}C. Evidently, the
CSMILES strings remain readable, especially for smaller molecules.
More difficult cases involving branching, rings, symmetry, and other
complications have also been covered by our definitions. Further,
CSMILES strings are canonicalized at the conformer level beyond simple
connectivity. As such, canonical CSMILES strings are invariant to
atom reordering, rigid translation, and rigid rotation. A two-way
conversion from three-dimensional (3D) structure to CSMILES has been
implemented, and the article is accompanied by a Python code which
effectuates such conversions. Possible applications for CSMILES strings
are discussed and include efficient storage of 3D structure information
as well as development of machine learning models for conformation-dependent
properties.
创建时间:
2025-09-26



