five

maomlab/Boldini2024

收藏
Hugging Face2024-10-02 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/maomlab/Boldini2024
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-classification - tabular-regression language: - en tags: - HTS pretty_name: Assay-Interfering-Compounds Finder size_categories: - 1M<n<10M dataset_summary: >- The assay-interfering-compounds finder consists of 17 different datasets. The datasets are uploaded after molecular sanitization using RDKit and MolVS. citation: >- @article{Boldini2024, title = {Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery}, ISSN = {2374-7951}, url = {http://dx.doi.org/10.1021/acscentsci.3c01517}, DOI = {10.1021/acscentsci.3c01517}, journal = {ACS Central Science}, publisher = {American Chemical Society (ACS)}, author = {Boldini, Davide and Friedrich, Lukas and Kuhn, Daniel and Sieber, Stephan A.}, year = {2024}, month = mar } config_names: - Boldini2024 configs: - config_name: Boldini2024 data_files: - GPCR.csv - GPCR2.csv - GPCR3.csv - channel_atp.csv - cysteine_protease.csv - IonChannel.csv - IonChannel2.csv - IonChannel3.csv - kinase.csv - serine.csv - splicing.csv - transcrption.csv - transcription2.csv - transcription3.csv - transporter.csv - ubiquitin.csv - zinc_finger.csv dataset_info: - config_name: GPCR_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: GPCR2_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: GPCR3_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: channel_atp_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: cysteine_protease_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: IonChannel_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: IonChannel2_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: IonChannel3_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: kinase_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: serine_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: splicing_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: transcription_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: transcription2_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: transcription3_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: transporter_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: ubiquitin_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 - config_name: zinc_finger_sanitized features: - name: "SMILES" dtype: string - name: "Primary" dtype: int64 - name: "Score" dtype: float64 - name: "Confirmatory" dtype: float64 --- # Boldini2024 (Assay-Interfering-Compounds Finder) 17 Datasets that are used to employ Minimum Variance Sampling Analysis (MVS-A) to find Assay Interfering Compounds (AIC) in High Throughput Screening data. In this study, they present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. Their method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. The datasets uploaded to our Hugging Face repository have been sanitized using RDKit and MolVS. If you want to try these processes with the original dataset, please follow the instructions in the [Processing Script.py](https://huggingface.co/datasets/maomlab/Boldini2024/blob/main/Boldini2024%20Preprocessing.py) file in the maomlab/Boldini2024. # Citation ACS Cent. Sci. 2024, 10, 4, 823–832 Publication Date:March 15, 2024 https://doi.org/10.1021/acscentsci.3c01517
提供机构:
maomlab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作