five

Well-curated QSAR datasets for diverse protein targets

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Well-curated_QSAR_datasets_for_diverse_protein_targets/20539893
下载链接
链接失效反馈
官方服务:
资源简介:
High-throughput screening (HTS) is the use of automated equipment to rapidly screen thousands to millions of molecules for the biological activity of interest in the early drug discovery process. However, this brute-force approach has low hit rates, typically around 0.05\%-0.5\%. Meanwhile, PubChem is a database supported by the National Institute of Health (NIH) that contains biological activities for millions of drug-like molecules, often from HTS experiments. However, the raw primary screening data from the PubChem have a high false positive rate. A series of secondary experimental screens on putative actives is used to remove these. While all relevant screens are linked, the datasets of molecules are often not curated to list all inactive molecules from the primary HTS and only confirmed actives after secondary screening. Thus, we identified nine high-quality HTS experiments in PubChem covering all important target protein classes for drug discovery. We carefully curated these datasets to have lists of inactive and confirmed active molecules.  We preprocessed the input SMIELS strings  to Structure-Data Files (SDFs). The dataset is specified by its PubChem Accession Identifier. Prepossessing to the original data includes converting SMILES strings to 3D SDF files, generating 3D conformation, and filtering. Conversion from SMILES to SDF files is done using Open Babel, version 2.4.1. Conformations are generated using Corina, version 4.3. Molecules are further filtered with validity, duplicates with BioChemical Library (BCL)
创建时间:
2022-08-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作