SocialDisNER corpus sample-set
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6359365
下载链接
链接失效反馈官方服务:
资源简介:
The SocialDisNER corpus of the SMM4H 2022 – Task 10 track was manually annotated by medical experts following the SMM4H-SocialDisNER guidelines.
These guidelines were adapted from previous efforts used to annotate patient clinical records and medical literature. It covers rules for annotating mentions of diseases in health-related tweets in Spanish, that cover patient generated content (selected through followers of patient association accounts of a diversity of pathologies including rare diseases, mental health, cancer, etc..).
Additionally, they also include some considerations regarding the codification of the annotations to SNOMED-CT concept codes.
The sample set consists of 10 tweets extracted from the training set and the objective is to see the structure of the dataset and its content:
socialdisner_sample-set:
tweets_txt: This folder contains individual txt files containing the tweets. The file name corresponds to the tweet id.
mentions.tsv: This file contains the manually annotated disease mentions. The file has the following fields:
Tweets_id: This is the id of the tweet, using Twitter API you can query the content of the tweet.
Begin: This is the position in the tweet where the annotation was found.
End: This is the position of the last character of the annotation in the tweet.
Type:This is the type of entity found, in our case "ENFERMEDAD".
Extraction: This is the literal extraction, in other words, the fragment of text which refers to the annotation.
For further information, please visit https://temu.bsc.es/socialdisner/
创建时间:
2022-10-21



