five

HealthE

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7539391
下载链接
链接失效反馈
官方服务:
资源简介:
# HealthE Dataset HealthE contains 3,400 pieces of health advice gathered 1) from public health websites (i.e. WebMD.com, MedlinePlus.gov, CDC.gov, and MayoClinic.org) 2) from the publicly available [Preclude dataset]([https://userpages.umbc.edu/~nroy/courses/shhasp18/papers/p286-preum.pdf](https://userpages.umbc.edu/~nroy/courses/shhasp18/papers/p286-preum.pdf)). Each sample was hand-labeled for health entity recognition by a team of 14 annotators at the author's institution. Automatic recognition of health entities will enable further research in large-scale modeling of texts from online health communities. The data is provided in two parts. Both are formatted using the popular, free python `pickle` library and require use of the popular, free `pandas` library. `healthe.pkl` is a `pandas.DataFrame` object containing the 3,400 health-advice statement with hand-labeled health entities.  `non_advice.pkl` is a `pandas.DataFrame` object containing the 2,256 pieces of non-advice statements.  To load the files in python, use the following code block. ``` import pickle import pandas as pd healthe_df = pd.read_pickle('healthe.pkl') non_advice_df = pd.read_pickle('non_advice_df.pkl') ``` `healthe_df` has four columns. * `text` contains the health advice statement text * `entities` contains a python list of (entity, class) tuples * `tokenized_text` contains a list of tokens obtained by tokenizing the health advice statement text  * `labels` contains a list of the same length as `tokenized_text`, where each token is mapped to a class label. `non_advice_df` has one column, `text`, referring to each non-health-advice-statement.
创建时间:
2023-01-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作