five

blux-food/compounds

收藏
Hugging Face2023-05-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/blux-food/compounds
下载链接
链接失效反馈
官方服务:
资源简介:
# Definición de campos 1. **uci_id**: UniChEM identifier. 2. **chembl_id**: ChEMBL identifier. 3. **molecule_type**: Type of molecule (Small molecule, Protein, Antibody, Oligosaccharide, Oligonucleotide, Cell, Unknown).⁶ 4. **alogp**: Calculated ALogP. Ghose-Crippen-Viswanadhan octanol-water partition coefficient (ALogP).¹ ² 5. **aromatic_rings**: number of aromatic rings. Aromatic rings are common structural components of polymers. 6. **cx_logd**: The calculated octanol/water distribution coefficient at pH7.4 using ChemAxon v17.29.0.³ 7. **cx_logp**: The calculated octanol/water partition coefficient using ChemAxon v17.29.0.³ 8. **cx_most_apka**: The most acidic pKa calculated using ChemAxon v17.29.0.³ 9. **cx_most_bpka**: The most basic pKa calculated using ChemAxon v17.29.0.³ 10. **full_molformula**: Molecular formula for the full compound (including any salt).⁴ 11. **full_mwt**: Molecular weight of the full compound including any salts.⁴ 12. **hba**: Number hydrogen bond acceptors.⁴ 13. **hba_lipinski**: Number of hydrogen bond acceptors calculated according to Lipinski's original rules (i.e., N + O count)).⁴ 14. **hbd**: Number hydrogen bond donors.⁴ 15. **hbd_lipinski**: Number of hydrogen bond donors calculated according to Lipinski's original rules (i.e., NH + OH count).⁴ 16. **heavy_atoms**: Number of heavy (non-hydrogen) atoms.⁴ 17. **molecular_species**: Indicates whether the compound is an acid/base/neutral.⁵ 18. **mw_freebase**: Molecular weight of parent compound.⁴ 19. **mw_monoisotopic**: Monoisotopic parent molecular weight.⁴ 20. **num_lipinski_ro5_violations**: Number of violations of Lipinski's rule of five using HBA_LIPINSKI and HBD_LIPINSKI counts.⁵ 21. **num_ro5_violations**: Number of violations of Lipinski's rule-of-five, using HBA and HBD definitions.⁵ 22. **psa**: Polar surface area.⁴ 23. **qed_weighted**: Weighted quantitative estimate of drug likeness (as defined by Bickerton et al., Nature Chem 2012).⁴ 24. **ro3_pass**: Indicates whether the compound passes the rule-of-three (mw < 300, logP < 3 etc).⁵ 25. **rtb**: Number rotatable bonds.⁴ 26. **canonical_smiles**: Canonical smiles, with no stereochemistry information. Generated using pipeline pilot.⁵ 27. **standard_inchi**: IUPAC standard InChI for the compound.⁵ 28. **standard_inchi_key**: IUPAC standard InChI key for the compound.⁵ 29. **natural_product**: Indicates whether the compound is natural product-derived (currently curated only for drugs).⁶ 30. **inorganic_flag**: Indicates whether the molecule is inorganic (i.e., containing only metal atoms and <2 carbon atoms).⁶ 31. **therapeutic_flag**: Indicates that a drug has a therapeutic application (as opposed to e.g., an imaging agent, additive etc).⁶ 32. **biotherapeutic**: A single related resource. Can be either a URI or set of nested resource data.⁶ 33. **polymer_flag**: Indicates whether a molecule is a small molecule polymer (e.g., polistyrex).⁶ 34. **prodrug**: Indicates that the molecule is a pro-drug (see molecule hierarchy for active component, where known).⁶ 35. **kegg_id**: KEGG identifier. 36. **formula**: Molecular formula for the full compound. 37. **exact_mass**: Mass of the compound (from KEGG). 38. **mol_weight**: mass of a molecule of a substance, based on 12 as the atomic weight of carbon-12.⁸ 39. atom: An ATOM entry represents KEGG Atom Type .¹⁰ 40. **bond**: A BOND entry is defined as a pair of ATOM entries that form a chemical bond in a molecule, corresponding to many named bonds in organic chemistry and biochemistry. ¹⁰ 41. **chebi_id**: ChEBI identifier. 42. **definition**: A simple definition of this compound. 43. **mass**: Returns the average mass. The relative masses are calculated from tables of relative atomic masses (atomic weights) published by IUPAC. (from CheBI).⁷ 44. **mol**: ChEBI stores the two-dimensional or three-dimensional structural diagrams as connection tables in MDL molfile format.⁷ 45. **smiles**: The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. 46. **inchi**: The International Chemical Identifier (InChI) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. 47. **inchi_key**: The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (27 character) condensed digital representation of the InChI that is not human-understandable. 48. **cas_id**: CAS Registry Number. A CAS Registry Number is a unique and unambiguous identifier for a specific substance that allows clear communication and, with the help of CAS scientists, links together all available data and research about that substance. 49. **substance**: Full substance name as recognized by CFSAN (FDA). ⁹ 50. **regs**: Code of Federal Regulations associated numbers of this compound (FDA). ⁹ 51. **syns**: Synonyms of the compound (FDA). 52. **used_for**: The physical or technical effect(s) the substance has in or on food; see 21 CFR 170.3(o) for definitions. (FDA). ⁹ ¹ http://chemgps.bmc.uu.se/help/dragonx/GhoseCrippenViswanadhanAlogP.html ² http://www.talete.mi.it/help/dproperties_help/index.html?molecular_properties.htm ³ http://chembl.blogspot.com/2020/03/chembl-26-released.html ⁴ https://micha-protocol.org/glossary/index ⁵ https://www.ebi.ac.uk/chembl/api/data/drug/schema ⁶ https://www.ebi.ac.uk/chembl/api/data/molecule/schema ⁷ http://libchebi.github.io/libChEBI%20API.pdf ⁸ https://www.britannica.com/science/molecular-weight ⁹ https://www.cfsanappsexternal.fda.gov/scripts/fdcc/?set=FoodSubstances&sort=Used_for_Technical_Effect ¹⁰ https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-7-S6-S2
提供机构:
blux-food
原始信息汇总

数据集字段定义

  1. uci_id: UniChEM标识符。
  2. chembl_id: ChEMBL标识符。
  3. molecule_type: 分子类型(小分子、蛋白质、抗体、寡糖、寡核苷酸、细胞、未知)。
  4. alogp: 计算的ALogP,Ghose-Crippen-Viswanadhan辛醇-水分割系数。
  5. aromatic_rings: 芳香环的数量。
  6. cx_logd: 使用ChemAxon v17.29.0计算的pH7.4时的辛醇/水分割系数。
  7. cx_logp: 使用ChemAxon v17.29.0计算的辛醇/水分割系数。
  8. cx_most_apka: 使用ChemAxon v17.29.0计算的最酸性pKa。
  9. cx_most_bpka: 使用ChemAxon v17.29.0计算的最碱性pKa。
  10. full_molformula: 完整化合物的分子式(包括任何盐)。
  11. full_mwt: 包括任何盐的完整化合物的分子量。
  12. hba: 氢键受体数量。
  13. hba_lipinski: 根据Lipinski原始规则计算的氢键受体数量。
  14. hbd: 氢键供体数量。
  15. hbd_lipinski: 根据Lipinski原始规则计算的氢键供体数量。
  16. heavy_atoms: 重(非氢)原子数量。
  17. molecular_species: 指示化合物是酸/碱/中性。
  18. mw_freebase: 母体化合物的分子量。
  19. mw_monoisotopic: 母体化合物的单同位素分子量。
  20. num_lipinski_ro5_violations: 使用HBA_LIPINSKI和HBD_LIPINSKI计数的Lipinski五规则违规数。
  21. num_ro5_violations: 使用HBA和HBD定义的Lipinski五规则违规数。
  22. psa: 极性表面积。
  23. qed_weighted: 加权定量药物相似性估计。
  24. ro3_pass: 指示化合物是否通过三规则(mw < 300, logP < 3等)。
  25. rtb: 旋转键数量。
  26. canonical_smiles: 无立体化学信息的规范微笑,使用pipeline pilot生成。
  27. standard_inchi: 化合物IUPAC标准InChI。
  28. standard_inchi_key: 化合物IUPAC标准InChI键。
  29. natural_product: 指示化合物是否为天然产物衍生。
  30. inorganic_flag: 指示分子是否为无机(仅含金属原子和<2个碳原子)。
  31. therapeutic_flag: 指示药物是否有治疗应用。
  32. biotherapeutic: 单个相关资源。可以是URI或嵌套资源数据。
  33. polymer_flag: 指示分子是否为小分子聚合物。
  34. prodrug: 指示分子是否为前药。
  35. kegg_id: KEGG标识符。
  36. formula: 完整化合物的分子式。
  37. exact_mass: 化合物的质量(来自KEGG)。
  38. mol_weight: 基于碳-12原子量的分子质量。
  39. atom: KEGG原子类型。
  40. bond: 化学键,对应于有机化学和生物化学中的许多命名键。
  41. chebi_id: ChEBI标识符。
  42. definition: 化合物简单定义。
  43. mass: 返回平均质量。
  44. mol: ChEBI存储的二维或三维结构图作为MDL molfile格式的连接表。
  45. smiles: 简化分子输入行条目系统。
  46. inchi: 国际化学标识符。
  47. inchi_key: InChIKey,非人类可理解的固定长度(27个字符)的压缩数字表示。
  48. cas_id: CAS注册号。
  49. substance: CFSAN(FDA)认可的完整物质名称。
  50. regs: 与该化合物相关的联邦法规代码编号(FDA)。
  51. syns: 化合物的同义词。
  52. used_for: 物质在食品中或对食品的物理或技术效果。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作