five

Gene embeddings used in GenePT: A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10030425
下载链接
链接失效反馈
官方服务:
资源简介:
These are the pulled NCBI (and UniProt, when applicable) summaries of genes, as well as the corresponding OpenAI text embeddings (text-embedding-ada-002 and text-embedding-3-large) computed on the summaries. See methods details in Chen and Zou (2024+). The unzipped folder contains four different files:  NCBI_summary_of_genes.json (NCBI gene card summary of human genes) NCBI_UniProt_summary_of_genes.json (NCBI gene card and UniProt protein (when applicable) summary of human genes) GenePT_gene_embedding_ada_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-ada-002 embeddings of the summary in 1. are the values) GenePT_gene_protein_embedding_model_3_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-3-large embeddings of the summary in 1. are the values) Reference: Chen YT, Zou J. (2024+) GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2023.10.16.562533v1.
创建时间:
2024-03-18
二维码
社区交流群
二维码
科研交流群
商业服务