klimczakjakubdev/pharmaco-explainer
收藏Hugging Face2026-03-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/klimczakjakubdev/pharmaco-explainer
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- chemistry
configs:
- config_name: k3
default: true
data_files:
- "k3/k3_split.parquet"
- config_name: k4
data_files:
- "k4/k4_split.parquet"
- config_name: k4_2ar
data_files:
- "k4_2ar/k4_2ar_split.parquet"
- config_name: k5
data_files:
- "k5/k5_split.parquet"
- config_name: k3_labels
data_files:
- "k3/k3_labels.parquet"
- config_name: k4_labels
data_files:
- "k4/k4_labels.parquet"
- config_name: k4_2ar_labels
data_files:
- "k4_2ar/k4_2ar_labels.parquet"
- config_name: k5_labels
data_files:
- "k5/k5_labels.parquet"
---
# Pharmaco-Explainer Datasets
This repository contains **datasets used in the Pharmaco-Explainer project**.
They are shared separately on Hugging Face and are used by the training and
experimentation code hosted on GitHub:
👉 **Training code and scripts:**
https://github.com/AdamSulek/pharmaco-explainer/
---
## Available Datasets
The following datasets are available:
- **k3:** 3-element pharmacophore
- **k4_2ar:** 4-element pharmacophore with two aromatic features
- **k4:** 4-element pharmacophore
- **k5** 5-element pharmacophore
Each dataset consists of three files:
- `<dataset>.parquet` – main feature data
- `<dataset>_split.parquet` – predefined train/val/test split
- `<dataset>_labels.parquet` – labels
Repository structure:
```
k3/
├── k3.parquet
├── k3_split.parquet
└── k3_labels.parquet
k4_2ar/
├── k4_2ar.parquet
├── k4_2ar_split.parquet
└── k4_2ar_labels.parquet
k4/
├── k4.parquet
├── k4_split.parquet
└── k4_labels.parquet
k5/
├── k5.parquet
├── k5_split.parquet
└── k5_labels.parquet
```
## Downloading the Data
Datasets can be downloaded **directly from Hugging Face** using the Python
script below.
### Requirements
- Python ≥ 3.8
- `requests`
- Environment variable `PHARM_PROJECT_ROOT` pointing to the project root
```bash
import argparse
from pathlib import Path
import requests
import os
def project_path(*parts):
root = os.environ.get("PHARM_PROJECT_ROOT")
if root is None:
raise RuntimeError(
"Environment variable PHARM_PROJECT_ROOT is not set.\n"
"Run:\n"
" export PHARM_PROJECT_ROOT=/path/to/project"
)
return os.path.join(root, *parts)
def download_file(url, dest_path: Path):
dest_path.parent.mkdir(parents=True, exist_ok=True)
if dest_path.exists():
print(f"File already exists, skipping: {dest_path}")
return
print(f"Downloading {url} -> {dest_path}")
r = requests.get(url, stream=True)
r.raise_for_status()
with open(dest_path, "wb") as f:
for chunk in r.iter_content(chunk_size=1024 * 1024):
if chunk:
f.write(chunk)
def download_dataset(dataset_name):
base_url = (
f"https://huggingface.co/datasets/"
f"klimczakjakubdev/pharmaco-explainer/resolve/main/{dataset_name}/"
)
files = [
f"{dataset_name}.parquet",
f"{dataset_name}_split.parquet",
f"{dataset_name}_labels.parquet",
]
dest_dir = Path(project_path("data", dataset_name))
for file_name in files:
url = base_url + file_name
dest_path = dest_dir / file_name
download_file(url, dest_path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--dataset",
type=str,
default="k3",
choices=["k3", "k4", "k4_2ar", "k5"],
help="Which dataset to download"
)
args = parser.parse_args()
download_dataset(args.dataset)
```
Example Usage
```
python download_dataset.py --dataset k4
```
Files will be downloaded to:
```
$PHARM_PROJECT_ROOT/data/k4/
```
## Related Repository
These datasets are used by the main Pharmaco-Explainer codebase:
🔗 https://github.com/AdamSulek/pharmaco-explainer/
That repository contains:
* model training code
* experiment configurations
* preprocessing scripts
* evaluation and explainability pipelines
提供机构:
klimczakjakubdev



