five

SocialDisNER corpus sample-set

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6359365
下载链接
链接失效反馈
官方服务:
资源简介:
The SocialDisNER corpus of the SMM4H 2022 – Task 10 track was manually annotated by medical experts following the SMM4H-SocialDisNER guidelines. These guidelines were adapted from previous efforts used to annotate patient clinical records and medical literature. It covers rules for annotating mentions of diseases in health-related tweets in Spanish, that cover patient generated content (selected through followers of patient association accounts of a diversity of pathologies including rare diseases, mental health, cancer, etc..). Additionally, they also include some considerations regarding the codification of the annotations to SNOMED-CT concept codes. The sample set consists of 10 tweets extracted from the training set and the objective is to see the structure of the dataset and its content: socialdisner_sample-set: tweets_txt:  This folder contains individual txt files containing the tweets. The file name corresponds to the tweet id. mentions.tsv: This file contains the manually annotated disease mentions. The file has the following fields: Tweets_id: This is the id of the tweet, using Twitter API you can query the content of the tweet. Begin: This is the position in the tweet where the annotation was found. End: This is the position of the last character of the annotation in the tweet. Type:This is the type of entity found, in our case "ENFERMEDAD". Extraction: This is the literal extraction, in other words, the fragment of text which refers to the annotation.  For further information, please visit https://temu.bsc.es/socialdisner/
创建时间:
2022-10-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作