yushuqiu/gene_research
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/yushuqiu/gene_research
下载链接
链接失效反馈官方服务:
资源简介:
# GenePT Notebook Data
This dataset package contains the local files that the notebooks under `GenePT/` load at runtime.
## Structure
- `embeddings/gene/`: primary gene embedding files used by the notebooks
- `embeddings/optional/`: optional embeddings referenced by notebook placeholders
- `cell_level/`: cell-level datasets and cached cell embeddings
- `gene_level/ggi/`: gene-gene interaction benchmark files
- `gene_level/function_property/`: gene property task labels and metadata
- `gene_level/ncbi/`: gene text summaries used to build GPT gene embeddings
- `ppi/raw/`: PPI benchmark inputs used by `figure_2_ppi_reproduction.ipynb`
- `ppi/optional/`: provenance/source archives that are useful but not directly loaded
- `manifests/`: file-level and notebook-level dependency manifests
- `scripts/`: helper scripts for upload and for recreating the original local layout
## Included Notebook Dependencies
- `aorta_data_analysis.ipynb`
- `cell_level_state_phenotype_eval.ipynb`
- `figure_2_ppi_reproduction.ipynb`
- `gene_embeddings_examples.ipynb`
- `gene_level_task_figure_2.ipynb`
- `gene_level_task_figure_2 - Copy.ipynb`
- `gene_level_task_table_1.ipynb`
- `GenePT_analysis_datasets/GenePT_s_data/MWE_cell_type_GenePT_s.ipynb`
## Not Included
These notebook-referenced files are not included because they are not available locally in this workspace:
- `input_data/ppi/downloads/GTEx-RNA-Seq.zip` for figure 2 panel `f`
- myeloid / multiple sclerosis state-task files referenced as placeholders in `cell_level_state_phenotype_eval.ipynb`
- `vocab.json` and `token_dictionary.pkl` from `request_ncbi_text_for_genes.ipynb`
- `demo_train.h5ad` and `demo_test.h5ad` from the legacy pancreas demo notebook
## Recommended Hugging Face Usage
Use a Hugging Face dataset repo. After downloading the dataset repo, run:
```powershell
pwsh -File .\scripts\materialize_legacy_layout.ps1 -WorkspaceRoot D:\Harvard\gene_research
```
That reconstructs the original `GenePT/` and `GenePT_emebdding_v2/` file layout expected by the notebooks.
To upload this staged folder to a dataset repo, run:
```powershell
pwsh -File .\scripts\upload_to_hf.ps1 -RepoId <your-username-or-org>/<repo-name> -Token <hf_token>
```
提供机构:
yushuqiu



