SocialDisNER corpus sample-set

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://zenodo.org/record/6359365

下载链接

链接失效反馈

官方服务：

资源简介：

The SocialDisNER corpus of the SMM4H 2022 – Task 10 track was manually annotated by medical experts following the SMM4H-SocialDisNER guidelines. These guidelines were adapted from previous efforts used to annotate patient clinical records and medical literature. It covers rules for annotating mentions of diseases in health-related tweets in Spanish, that cover patient generated content (selected through followers of patient association accounts of a diversity of pathologies including rare diseases, mental health, cancer, etc..). Additionally, they also include some considerations regarding the codification of the annotations to SNOMED-CT concept codes. The sample set consists of 10 tweets extracted from the training set and the objective is to see the structure of the dataset and its content: socialdisner_sample-set: tweets_txt: This folder contains individual txt files containing the tweets. The file name corresponds to the tweet id. mentions.tsv: This file contains the manually annotated disease mentions. The file has the following fields: Tweets_id: This is the id of the tweet, using Twitter API you can query the content of the tweet. Begin: This is the position in the tweet where the annotation was found. End: This is the position of the last character of the annotation in the tweet. Type:This is the type of entity found, in our case "ENFERMEDAD". Extraction: This is the literal extraction, in other words, the fragment of text which refers to the annotation. For further information, please visit https://temu.bsc.es/socialdisner/

创建时间：

2022-10-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集