blux-food/compounds
收藏Hugging Face2023-05-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/blux-food/compounds
下载链接
链接失效反馈官方服务:
资源简介:
# Definición de campos
1. **uci_id**: UniChEM identifier.
2. **chembl_id**: ChEMBL identifier.
3. **molecule_type**: Type of molecule (Small molecule, Protein, Antibody, Oligosaccharide, Oligonucleotide, Cell, Unknown).⁶
4. **alogp**: Calculated ALogP. Ghose-Crippen-Viswanadhan octanol-water partition coefficient (ALogP).¹ ²
5. **aromatic_rings**: number of aromatic rings. Aromatic rings are common structural components of polymers.
6. **cx_logd**: The calculated octanol/water distribution coefficient at pH7.4 using ChemAxon v17.29.0.³
7. **cx_logp**: The calculated octanol/water partition coefficient using ChemAxon v17.29.0.³
8. **cx_most_apka**: The most acidic pKa calculated using ChemAxon v17.29.0.³
9. **cx_most_bpka**: The most basic pKa calculated using ChemAxon v17.29.0.³
10. **full_molformula**: Molecular formula for the full compound (including any salt).⁴
11. **full_mwt**: Molecular weight of the full compound including any salts.⁴
12. **hba**: Number hydrogen bond acceptors.⁴
13. **hba_lipinski**: Number of hydrogen bond acceptors calculated according to Lipinski's original rules (i.e., N + O count)).⁴
14. **hbd**: Number hydrogen bond donors.⁴
15. **hbd_lipinski**: Number of hydrogen bond donors calculated according to Lipinski's original rules (i.e., NH + OH count).⁴
16. **heavy_atoms**: Number of heavy (non-hydrogen) atoms.⁴
17. **molecular_species**: Indicates whether the compound is an acid/base/neutral.⁵
18. **mw_freebase**: Molecular weight of parent compound.⁴
19. **mw_monoisotopic**: Monoisotopic parent molecular weight.⁴
20. **num_lipinski_ro5_violations**: Number of violations of Lipinski's rule of five using HBA_LIPINSKI and HBD_LIPINSKI counts.⁵
21. **num_ro5_violations**: Number of violations of Lipinski's rule-of-five, using HBA and HBD definitions.⁵
22. **psa**: Polar surface area.⁴
23. **qed_weighted**: Weighted quantitative estimate of drug likeness (as defined by Bickerton et al., Nature Chem 2012).⁴
24. **ro3_pass**: Indicates whether the compound passes the rule-of-three (mw < 300, logP < 3 etc).⁵
25. **rtb**: Number rotatable bonds.⁴
26. **canonical_smiles**: Canonical smiles, with no stereochemistry information. Generated using pipeline pilot.⁵
27. **standard_inchi**: IUPAC standard InChI for the compound.⁵
28. **standard_inchi_key**: IUPAC standard InChI key for the compound.⁵
29. **natural_product**: Indicates whether the compound is natural product-derived (currently curated only for drugs).⁶
30. **inorganic_flag**: Indicates whether the molecule is inorganic (i.e., containing only metal atoms and <2 carbon atoms).⁶
31. **therapeutic_flag**: Indicates that a drug has a therapeutic application (as opposed to e.g., an imaging agent, additive etc).⁶
32. **biotherapeutic**: A single related resource. Can be either a URI or set of nested resource data.⁶
33. **polymer_flag**: Indicates whether a molecule is a small molecule polymer (e.g., polistyrex).⁶
34. **prodrug**: Indicates that the molecule is a pro-drug (see molecule hierarchy for active component, where known).⁶
35. **kegg_id**: KEGG identifier.
36. **formula**: Molecular formula for the full compound.
37. **exact_mass**: Mass of the compound (from KEGG).
38. **mol_weight**: mass of a molecule of a substance, based on 12 as the atomic weight of carbon-12.⁸
39. atom: An ATOM entry represents KEGG Atom Type .¹⁰
40. **bond**: A BOND entry is defined as a pair of ATOM entries that form a chemical bond in a molecule, corresponding to many named bonds in organic chemistry and biochemistry. ¹⁰
41. **chebi_id**: ChEBI identifier.
42. **definition**: A simple definition of this compound.
43. **mass**: Returns the average mass. The relative masses are calculated from tables of relative atomic masses (atomic weights) published by IUPAC. (from CheBI).⁷
44. **mol**: ChEBI stores the two-dimensional or three-dimensional structural diagrams as connection tables in MDL molfile format.⁷
45. **smiles**: The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings.
46. **inchi**: The International Chemical Identifier (InChI) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web.
47. **inchi_key**: The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (27 character) condensed digital representation of the InChI that is not human-understandable.
48. **cas_id**: CAS Registry Number. A CAS Registry Number is a unique and unambiguous identifier for a specific substance that allows clear communication and, with the help of CAS scientists, links together all available data and research about that substance.
49. **substance**: Full substance name as recognized by CFSAN (FDA). ⁹
50. **regs**: Code of Federal Regulations associated numbers of this compound (FDA). ⁹
51. **syns**: Synonyms of the compound (FDA).
52. **used_for**: The physical or technical effect(s) the substance has in or on food; see 21 CFR 170.3(o) for definitions. (FDA). ⁹
¹ http://chemgps.bmc.uu.se/help/dragonx/GhoseCrippenViswanadhanAlogP.html
² http://www.talete.mi.it/help/dproperties_help/index.html?molecular_properties.htm
³ http://chembl.blogspot.com/2020/03/chembl-26-released.html
⁴ https://micha-protocol.org/glossary/index
⁵ https://www.ebi.ac.uk/chembl/api/data/drug/schema
⁶ https://www.ebi.ac.uk/chembl/api/data/molecule/schema
⁷ http://libchebi.github.io/libChEBI%20API.pdf
⁸ https://www.britannica.com/science/molecular-weight
⁹ https://www.cfsanappsexternal.fda.gov/scripts/fdcc/?set=FoodSubstances&sort=Used_for_Technical_Effect
¹⁰ https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-7-S6-S2
提供机构:
blux-food
原始信息汇总
数据集字段定义
- uci_id: UniChEM标识符。
- chembl_id: ChEMBL标识符。
- molecule_type: 分子类型(小分子、蛋白质、抗体、寡糖、寡核苷酸、细胞、未知)。
- alogp: 计算的ALogP,Ghose-Crippen-Viswanadhan辛醇-水分割系数。
- aromatic_rings: 芳香环的数量。
- cx_logd: 使用ChemAxon v17.29.0计算的pH7.4时的辛醇/水分割系数。
- cx_logp: 使用ChemAxon v17.29.0计算的辛醇/水分割系数。
- cx_most_apka: 使用ChemAxon v17.29.0计算的最酸性pKa。
- cx_most_bpka: 使用ChemAxon v17.29.0计算的最碱性pKa。
- full_molformula: 完整化合物的分子式(包括任何盐)。
- full_mwt: 包括任何盐的完整化合物的分子量。
- hba: 氢键受体数量。
- hba_lipinski: 根据Lipinski原始规则计算的氢键受体数量。
- hbd: 氢键供体数量。
- hbd_lipinski: 根据Lipinski原始规则计算的氢键供体数量。
- heavy_atoms: 重(非氢)原子数量。
- molecular_species: 指示化合物是酸/碱/中性。
- mw_freebase: 母体化合物的分子量。
- mw_monoisotopic: 母体化合物的单同位素分子量。
- num_lipinski_ro5_violations: 使用HBA_LIPINSKI和HBD_LIPINSKI计数的Lipinski五规则违规数。
- num_ro5_violations: 使用HBA和HBD定义的Lipinski五规则违规数。
- psa: 极性表面积。
- qed_weighted: 加权定量药物相似性估计。
- ro3_pass: 指示化合物是否通过三规则(mw < 300, logP < 3等)。
- rtb: 旋转键数量。
- canonical_smiles: 无立体化学信息的规范微笑,使用pipeline pilot生成。
- standard_inchi: 化合物IUPAC标准InChI。
- standard_inchi_key: 化合物IUPAC标准InChI键。
- natural_product: 指示化合物是否为天然产物衍生。
- inorganic_flag: 指示分子是否为无机(仅含金属原子和<2个碳原子)。
- therapeutic_flag: 指示药物是否有治疗应用。
- biotherapeutic: 单个相关资源。可以是URI或嵌套资源数据。
- polymer_flag: 指示分子是否为小分子聚合物。
- prodrug: 指示分子是否为前药。
- kegg_id: KEGG标识符。
- formula: 完整化合物的分子式。
- exact_mass: 化合物的质量(来自KEGG)。
- mol_weight: 基于碳-12原子量的分子质量。
- atom: KEGG原子类型。
- bond: 化学键,对应于有机化学和生物化学中的许多命名键。
- chebi_id: ChEBI标识符。
- definition: 化合物简单定义。
- mass: 返回平均质量。
- mol: ChEBI存储的二维或三维结构图作为MDL molfile格式的连接表。
- smiles: 简化分子输入行条目系统。
- inchi: 国际化学标识符。
- inchi_key: InChIKey,非人类可理解的固定长度(27个字符)的压缩数字表示。
- cas_id: CAS注册号。
- substance: CFSAN(FDA)认可的完整物质名称。
- regs: 与该化合物相关的联邦法规代码编号(FDA)。
- syns: 化合物的同义词。
- used_for: 物质在食品中或对食品的物理或技术效果。



