Taxnonomic classifications for all structure in the QM9 dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6498856
下载链接
链接失效反馈官方服务:
资源简介:
The classification of molecules according to ClassyFire [1] for the QM9 dataset [2].
The QM9 dataset is a set of nearly 140k organic molecules with no more than 9 C, N, O, and F atoms optimized to a stable structure with DFT.
ClassyFire is a tool and taxonomic library for the labeling of molecules.
1. Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 1–20 (2016).
2. Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 1–7 (2014).
The data directory ('QM9_jsons_classified.tar.gz') contains a `json` file for each structure in the QM9 dataset. The name of the file is the same identifier as from QM9. Data fields include:
- `cf_alternative_parents` : classifications describing the compound that do not fall in the given ancestry
- `cf_ancestors` : classes along the taxonomic branch for the structure
- `cf_class` : ClassyFire given class
- `cf_superclass` : ClassyFire given super class
- `cf_subclass` : ClassyFire given subclass
- `cf_direct_parent` : Class one level above this structure on the taxonomic branch
- `cf_description` : Exposition on the given class
- `cf_identifier` : identifier for the structure in the ClassyFire database
- `cf_intermediate_nodes` : classes connecting branches on taxonomic tree
- `cf_kingdom` : ClassyFire given kingdom
- `cf_molecular_framework` : describes aromaticity and number of cycles
- `cf_predicted_chebi_terms` : terms describing the molecule in the ChEBI framework
- `cf_predicted_lipidmaps_terms` : terms describing the molecule in LIPID MAPS framework
- `cf_smiles` : smiles string given by ClassyFire
- `cf_substituents` : substituent groups in the structure
Many fields contain subfields, seen in the example below for molecule with QM9 id 000123:
{"cf_alternative_parents":[{"name":"Dialkylamines","description":"Organic compounds containing a dialkylamine group, characterized by two alkyl groups bonded to the amino nitrogen.","chemont_id":"CHEMONTID:0002228","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0002228"},{"name":"Organopnictogen compounds","description":"Compounds containing a bond between carbon a pnictogen atom. Pnictogens are p-block element atoms that are in the group 15 of the periodic table.","chemont_id":"CHEMONTID:0004557","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004557"},{"name":"Hydrocarbon derivatives","description":"Derivatives of hydrocarbons obtained by substituting one or more carbon atoms by an heteroatom. They contain at least one carbon atom and heteroatom.","chemont_id":"CHEMONTID:0004150","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004150"}],"cf_ancestors":["Alpha-aminonitriles","Amines","Chemical entities","Dialkylamines","Hydrocarbon derivatives","Nitriles","Organic compounds","Organic cyanides","Organic nitrogen compounds","Organonitrogen compounds","Organopnictogen compounds","Secondary amines"],"cf_class":"Organonitrogen compounds","cf_classification_version":"2.1","cf_description":"This compound belongs to the class of organic compounds known as alpha-aminonitriles. These are organonitrogen compounds that contain an amino group located on the carbon at the position alpha to a carbonitrile group. They have the general formula RC(NH2)C#N, where the amine group can be substituted.","cf_direct_parent":{"name":"Alpha-aminonitriles","description":"Organonitrogen compounds that contain an amino group located on the carbon at the position alpha to a carbonitrile group. They have the general formula RC(NH2)C#N, where the amine group can be substituted.","chemont_id":"CHEMONTID:0004453","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0004453"},"cf_external_descriptors":[],"cf_identifier":"Q5198051-1","cf_inchikey":"InChIKey=PVVRRUUMHFWFQV-UHFFFAOYSA-N","cf_intermediate_nodes":[{"name":"Nitriles","description":"Compounds having the structure RC#N; thus C-substituted derivatives of hydrocyanic acid, HC#N.","chemont_id":"CHEMONTID:0000362","url":"http:\/\/classyfire.wishartlab.com\/tax_nodes\/C0000362"}],"cf_kingdom":"Organic compounds","cf_molecular_framework":"Aliphatic acyclic compounds","cf_predicted_chebi_terms":["chemical entity (CHEBI:24431)","organic molecular entity (CHEBI:50860)","organonitrogen compound (CHEBI:35352)","secondary amino compound (CHEBI:50995)","nitrile (CHEBI:18379)","amine (CHEBI:32952)","secondary amine (CHEBI:32863)","cyanides (CHEBI:23424)","organic molecule (CHEBI:72695)","pnictogen molecular entity (CHEBI:33302)","nitrogen molecular entity (CHEBI:51143)"],"cf_predicted_lipidmaps_terms":[],"cf_smiles":"CNCC#N","cf_subclass":"Organic cyanides","cf_substituents":["Alpha-aminonitrile","Secondary amine","Secondary aliphatic amine","Organopnictogen compound","Hydrocarbon derivative","Amine","Aliphatic acyclic compound"],"cf_superclass":"Organic nitrogen compounds"}
A visualization ''qm9_pie_labeled.png" is given of a fracturization of superclasses within qm9 down to subclass.
创建时间:
2024-07-16



