five

Natural language processing of gene descriptions for overrepresentation analysis with GeneTEA

收藏
DataCite Commons2025-10-10 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/Natural_language_processing_of_gene_descriptions_for_overrepresentation_analysis_with_GeneTEA/28635317/1
下载链接
链接失效反馈
官方服务:
资源简介:
Data and models supporting the bioRxiv preprint "<b>Natural language processing of gene descriptions for overrepresentation analysis with GeneTEA</b>" (Boyle et al. 2025)File descriptions:<b>GeneTEA.pkl, GeneTEA-yeast.pkl, PharmaTEA.pkl</b> - pickled GeneTEA models<b>enrichr_sets_03_01_2025.csv: </b>Enrichr<b> </b>database downloaded 3/1/2025, used for Figure 2 and S1.<b>gene sets for connexin</b><b>.gmt</b>: Enrichr gene sets containing the term "connexin", downloaded from the Enrichr site.<b>false_discoveries</b><b>.csv</b>: Benchmarking results for false discovery control in Figures 3 and S1. <b>EF_hand_example.csv</b>: Top terms and MedCPT scores for EF-hand example in Figure 3. <b>[Hallmark or Experimentally Derived Queries]_</b><b>score</b><b>s.csv</b>: Benchmarking results for [Hallmark or Experimentally Derived Queries] across joined top terms in Figures 3 and S2. The<i> "</i>joined_<i></i>ranking" column corresponds to the MedCPT Relevance score across the top terms and "num_high_redundancy" contains the number of redundant term pairs.<b>[</b><b>Hallmark or Experimentally Derived Queries</b><b>]_</b><b>indiv.csv</b>: Benchmarking results for [Hallmark or Experimentally Derived Queries] for each top term in Figures 3 and S2. The<i> "</i>indiv<i>_</i>ranking" column corresponds to the MedCPT Relevance score for a single term.<b>Fig4_left/right</b>: Examples of top terms shown in Figure 4.<b>gProfiler_hsapiens_3-13-2025_9-14-59 AM__intersections.csv: </b>g:GOSt results for Fig4_left, downloaded from the g:Profiler site.<b>gProfiler_hsapiens_2-11-2025_10-18-19 AM__intersections.csv</b>: g:GOSt results for Fig4_right, downloaded from the g:Profiler site.
提供机构:
figshare
创建时间:
2025-03-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作