Gene embeddings used in GenePT: A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10030425
下载链接
链接失效反馈官方服务:
资源简介:
These are the pulled NCBI (and UniProt, when applicable) summaries of genes, as well as the corresponding OpenAI text embeddings (text-embedding-ada-002 and text-embedding-3-large) computed on the summaries. See methods details in Chen and Zou (2024+).
The unzipped folder contains four different files:
NCBI_summary_of_genes.json (NCBI gene card summary of human genes)
NCBI_UniProt_summary_of_genes.json (NCBI gene card and UniProt protein (when applicable) summary of human genes)
GenePT_gene_embedding_ada_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-ada-002 embeddings of the summary in 1. are the values)
GenePT_gene_protein_embedding_model_3_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-3-large embeddings of the summary in 1. are the values)
Reference:
Chen YT, Zou J. (2024+) GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2023.10.16.562533v1.
创建时间:
2024-03-18



