five

Wiki-Disease-Benchmark

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11184172
下载链接
链接失效反馈
官方服务:
资源简介:
This benchmark consist in 255 randomly selected disease descriptions, as of February 2024. Each disease description was labeled by two data annotators who reviewed each other's annotations to ensure accuracy and consistency across the dataset.  This procedure involves collecting, parsing and extracting data from Wikipedia using a software routine that interfaces with an API \footnote{https://pypi.org/project/Wikipedia-API/} to systematically retrieve and collate information related to a predefined disease. Specifically, it searches for pages with a certain disease and, within those pages, extracts the "Sings and Symptoms" section. This process has two steps: Retrieve all the labels rdfs:label of triples in DBpedia \footnote{https://dbpedia.org/} that are a disease rdf:type dbo:Disease. With these labels, go to each page of Wikipedia and scrape the section "Signs and Symptoms". After extracting the text from Wikipedia, the phenotypical entities were annotated.
创建时间:
2024-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作