batman2
收藏数据集概述
数据集名称
- 名称: batman
- 许可证: cc-by-nc-4.0
- 许可证链接: https://creativecommons.org/licenses/by-nc/4.0/
数据集规模
- 规模: 100K<n<1M
数据集配置
- batman_ingredients:
- 数据文件: data/batman_ingredients.csv
- 分隔符:
- batman_herbs:
- 数据文件: data/batman_herbs.csv
- 分隔符:
- batman_formulas:
- 数据文件: data/batman_formulas.csv
- 分隔符:
任务类别
- 类别: other
语言
- 语言: en, cn
标签
- 标签: biology, drug discovery, chemical screening, life sciences, chemistry, medical
数据集目的
- 目的: 该数据集是传统中药(TCM)成分、草药和配方的数据库BATMAN-2.0的重新格式化的副本。
数据集用途
- 用途: 该数据集可以用于药物发现实验或扩展您自己的生物医学语言模型的功能。例如,我们使用此数据集扩展了Precious3GPT在
demo/TCM_geroprotectors演示项目中的功能。
数据集结构
- batman_ingredients:
- 字段:
UID: 在整个数据集中实体的唯一ID;cid: PubChem分子ID;pref_name: 化合物的常规名称(如BATMAN中所述);synonyms: 与CID相关的其他名称(如果存在于BATMAN中);targets_known: 根据BATMAN验证的人类蛋白质靶点,格式为"symbol(ENTREZ_ID)";targets_pred: 根据BATMAN所述的所有预测蛋白质靶点(未使用显著性阈值过滤这些列表);herbs: 化合物所遇到的草药列表;formulas: 包含此化合物的TCM配方列表。
- 字段:
- batman_herb:
- 字段:
UID;pref_name: 最常见的是草药的拼音名称。如果不可用,则使用任何其他可用名称,例如拉丁语或常用英语;synonyms;ingredients: 草药中出现的所有化合物CID列表;formulas: 包含此草药的TCM配方列表。
- 字段:
- batman_formulas:
- 字段:
UID;pref_name: 草药药物的拼音名称。非相同成分共享相同名称时,使用~X后缀进行区分;synonyms: 配方的汉字拼写;ingredients;herbs。
- 字段:
引用
@article{10.1093/nar/gkad926, author = {Kong, Xiangren and Liu, Chao and Zhang, Zuzhen and Cheng, Meiqi and Mei, Zhijun and Li, Xiangdong and Liu, Peng and Diao, Lihong and Ma, Yajie and Jiang, Peng and Kong, Xiangya and Nie, Shiyan and Guo, Yingzi and Wang, Ze and Zhang, Xinlei and Wang, Yan and Tang, Liujun and Guo, Shuzhen and Liu, Zhongyang and Li, Dong}, title = "{BATMAN-TCM 2.0: an enhanced integrative database for known and predicted interactions between traditional Chinese medicine ingredients and target proteins}", journal = {Nucleic Acids Research}, volume = {52}, number = {D1}, pages = {D1110-D1120}, year = {2023}, month = {10}, abstract = "{Traditional Chinese medicine (TCM) is increasingly recognized and utilized worldwide. However, the complex ingredients of TCM and their interactions with the human body make elucidating molecular mechanisms challenging, which greatly hinders the modernization of TCM. In 2016, we developed BATMAN-TCM 1.0, which is an integrated database of TCM ingredient–target protein interaction (TTI) for pharmacology research. Here, to address the growing need for a higher coverage TTI dataset, and using omics data to screen active TCM ingredients or herbs for complex disease treatment, we updated BATMAN-TCM to version 2.0 (http://bionet.ncpsb.org.cn/batman-tcm/). Using the same protocol as version 1.0, we collected 17 068 known TTIs by manual curation (with a 62.3-fold increase), and predicted ∼2.3 million high-confidence TTIs. In addition, we incorporated three new features into the updated version: (i) it enables simultaneous exploration of the target of TCM ingredient for pharmacology research and TCM ingredients binding to target proteins for drug discovery; (ii) it has significantly expanded TTI coverage; and (iii) the website was redesigned for better user experience and higher speed. We believe that BATMAN-TCM 2.0, as a discovery repository, will contribute to the study of TCM molecular mechanisms and the development of new drugs for complex diseases.}", issn = {0305-1048}, doi = {10.1093/nar/gkad926}, url = {https://doi.org/10.1093/nar/gkad926}, eprint = {https://academic.oup.com/nar/article-pdf/52/D1/D1110/55040286/gkad926.pdf}, }




