five

Cancer Health Disparities drivers with BERTopic Modelling and PyCaret Evaluation - Text data

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7827129
下载链接
链接失效反馈
官方服务:
资源简介:
The complex interplay of social, behavioral, lifestyle, environmental, health system, and natural health variables contribute to disparities in cancer treatment across racial and ethnic groups. Consequently, it is necessary to identify the variables contributing to cancer health inequalities and develop strategies to achieve health equality. PubMed abstract on Cancer health disparities was scraped with a bio.Entrez python package. Preprocessed data with regex and Natural tool kit (NLTK), topic modelling with BERTopic embeddings, and c-TF-IDF to construct dense clusters and analyze top topics linked with Cancer health disparities. Model evaluation with PyCaret coherence score and web app deployment with Streamlit. The results showed that Topic 32 with terms obese, female, male, school, survey, student, poet, and discrepancy had the best coherence score of 0.3687. In contrast, topic 8, with terms prevalence, adult, income, high, usage, diabetes, education, elderly, change and low, received the least coherence score of 0.3255. The model classifies each Subject Word score based on the scores, the granular topic concerns and trends related to cancer health disparities, investigates the connection between drivers of cancer health disparities, and evaluates the model with their coherence score values
创建时间:
2024-07-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作