introvoyz041/Medex

Name: introvoyz041/Medex
Creator: introvoyz041
Published: 2025-12-25 17:54:15
License: 暂无描述

Hugging Face2025-12-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/introvoyz041/Medex

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: PMID dtype: large_string - name: DOI dtype: large_string - name: entity dtype: large_string - name: fact dtype: large_string - name: MolInfo struct: - name: SMILES dtype: large_string - name: GeneInfo struct: - name: NCBI_Gene_ID dtype: int64 - name: protein_refseq_id dtype: large_string - name: gene_refseq_id dtype: large_string - name: ISSN dtype: large_string - name: eISSN dtype: large_string - name: Journal dtype: large_string splits: - name: train num_bytes: 12887091678 num_examples: 36308777 download_size: 3490707811 dataset_size: 12887091678 configs: - config_name: default data_files: - split: train path: data/train-* tags: - biology - chemistry - medical - synthetic --- This is the initial release of the `Medex` dataset, which contains facts about small molecules and genes / proteins extracted from a large number of PubMed articles. Each fact is accompanied by an associated identifier for small molecules and genes / proteins. For small molecules, this is simply the SMILES string, and for genes / proteins it is the NCBI Gene ID. We also include information about the publication venue for the papers where the fact was retrieved from (journal name, ISSN, and eISSN) to allow for coarse grained filtering by rigor or focus. As we extract more facts from PubMed we will upload expanded versions here. The dataset can be loaded with HuggingFace dataset as follows: ```python from datasets import load_dataset # Login using e.g. `huggingface-cli login` to access this dataset ds = load_dataset("medexanon/Medex", split="train") ``` Croissant information can be loaded as follows: ```python import mlcroissant as mlc croissant_dataset = mlc.Dataset("https://huggingface.co/api/datasets/medexanon/Medex/croissant") print(croissant_dataset.metadata.record_sets) ```

提供机构：

introvoyz041

5,000+

优质数据集

54 个

任务类型

进入经典数据集