five

yushuqiu/gene_research

收藏
Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/yushuqiu/gene_research
下载链接
链接失效反馈
官方服务:
资源简介:
# GenePT Notebook Data This dataset package contains the local files that the notebooks under `GenePT/` load at runtime. ## Structure - `embeddings/gene/`: primary gene embedding files used by the notebooks - `embeddings/optional/`: optional embeddings referenced by notebook placeholders - `cell_level/`: cell-level datasets and cached cell embeddings - `gene_level/ggi/`: gene-gene interaction benchmark files - `gene_level/function_property/`: gene property task labels and metadata - `gene_level/ncbi/`: gene text summaries used to build GPT gene embeddings - `ppi/raw/`: PPI benchmark inputs used by `figure_2_ppi_reproduction.ipynb` - `ppi/optional/`: provenance/source archives that are useful but not directly loaded - `manifests/`: file-level and notebook-level dependency manifests - `scripts/`: helper scripts for upload and for recreating the original local layout ## Included Notebook Dependencies - `aorta_data_analysis.ipynb` - `cell_level_state_phenotype_eval.ipynb` - `figure_2_ppi_reproduction.ipynb` - `gene_embeddings_examples.ipynb` - `gene_level_task_figure_2.ipynb` - `gene_level_task_figure_2 - Copy.ipynb` - `gene_level_task_table_1.ipynb` - `GenePT_analysis_datasets/GenePT_s_data/MWE_cell_type_GenePT_s.ipynb` ## Not Included These notebook-referenced files are not included because they are not available locally in this workspace: - `input_data/ppi/downloads/GTEx-RNA-Seq.zip` for figure 2 panel `f` - myeloid / multiple sclerosis state-task files referenced as placeholders in `cell_level_state_phenotype_eval.ipynb` - `vocab.json` and `token_dictionary.pkl` from `request_ncbi_text_for_genes.ipynb` - `demo_train.h5ad` and `demo_test.h5ad` from the legacy pancreas demo notebook ## Recommended Hugging Face Usage Use a Hugging Face dataset repo. After downloading the dataset repo, run: ```powershell pwsh -File .\scripts\materialize_legacy_layout.ps1 -WorkspaceRoot D:\Harvard\gene_research ``` That reconstructs the original `GenePT/` and `GenePT_emebdding_v2/` file layout expected by the notebooks. To upload this staged folder to a dataset repo, run: ```powershell pwsh -File .\scripts\upload_to_hf.ps1 -RepoId <your-username-or-org>/<repo-name> -Token <hf_token> ```
提供机构:
yushuqiu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作