five

klimczakjakubdev/pharmaco-explainer

收藏
Hugging Face2026-03-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/klimczakjakubdev/pharmaco-explainer
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en tags: - chemistry configs: - config_name: k3 default: true data_files: - "k3/k3_split.parquet" - config_name: k4 data_files: - "k4/k4_split.parquet" - config_name: k4_2ar data_files: - "k4_2ar/k4_2ar_split.parquet" - config_name: k5 data_files: - "k5/k5_split.parquet" - config_name: k3_labels data_files: - "k3/k3_labels.parquet" - config_name: k4_labels data_files: - "k4/k4_labels.parquet" - config_name: k4_2ar_labels data_files: - "k4_2ar/k4_2ar_labels.parquet" - config_name: k5_labels data_files: - "k5/k5_labels.parquet" --- # Pharmaco-Explainer Datasets This repository contains **datasets used in the Pharmaco-Explainer project**. They are shared separately on Hugging Face and are used by the training and experimentation code hosted on GitHub: 👉 **Training code and scripts:** https://github.com/AdamSulek/pharmaco-explainer/ --- ## Available Datasets The following datasets are available: - **k3:** 3-element pharmacophore - **k4_2ar:** 4-element pharmacophore with two aromatic features - **k4:** 4-element pharmacophore - **k5** 5-element pharmacophore Each dataset consists of three files: - `<dataset>.parquet` – main feature data - `<dataset>_split.parquet` – predefined train/val/test split - `<dataset>_labels.parquet` – labels Repository structure: ``` k3/ ├── k3.parquet ├── k3_split.parquet └── k3_labels.parquet k4_2ar/ ├── k4_2ar.parquet ├── k4_2ar_split.parquet └── k4_2ar_labels.parquet k4/ ├── k4.parquet ├── k4_split.parquet └── k4_labels.parquet k5/ ├── k5.parquet ├── k5_split.parquet └── k5_labels.parquet ``` ## Downloading the Data Datasets can be downloaded **directly from Hugging Face** using the Python script below. ### Requirements - Python ≥ 3.8 - `requests` - Environment variable `PHARM_PROJECT_ROOT` pointing to the project root ```bash import argparse from pathlib import Path import requests import os def project_path(*parts): root = os.environ.get("PHARM_PROJECT_ROOT") if root is None: raise RuntimeError( "Environment variable PHARM_PROJECT_ROOT is not set.\n" "Run:\n" " export PHARM_PROJECT_ROOT=/path/to/project" ) return os.path.join(root, *parts) def download_file(url, dest_path: Path): dest_path.parent.mkdir(parents=True, exist_ok=True) if dest_path.exists(): print(f"File already exists, skipping: {dest_path}") return print(f"Downloading {url} -> {dest_path}") r = requests.get(url, stream=True) r.raise_for_status() with open(dest_path, "wb") as f: for chunk in r.iter_content(chunk_size=1024 * 1024): if chunk: f.write(chunk) def download_dataset(dataset_name): base_url = ( f"https://huggingface.co/datasets/" f"klimczakjakubdev/pharmaco-explainer/resolve/main/{dataset_name}/" ) files = [ f"{dataset_name}.parquet", f"{dataset_name}_split.parquet", f"{dataset_name}_labels.parquet", ] dest_dir = Path(project_path("data", dataset_name)) for file_name in files: url = base_url + file_name dest_path = dest_dir / file_name download_file(url, dest_path) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument( "--dataset", type=str, default="k3", choices=["k3", "k4", "k4_2ar", "k5"], help="Which dataset to download" ) args = parser.parse_args() download_dataset(args.dataset) ``` Example Usage ``` python download_dataset.py --dataset k4 ``` Files will be downloaded to: ``` $PHARM_PROJECT_ROOT/data/k4/ ``` ## Related Repository These datasets are used by the main Pharmaco-Explainer codebase: 🔗 https://github.com/AdamSulek/pharmaco-explainer/ That repository contains: * model training code * experiment configurations * preprocessing scripts * evaluation and explainability pipelines
提供机构:
klimczakjakubdev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作