five

janduplessis886/england-nhs-gp-reviews

收藏
Hugging Face2024-01-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/janduplessis886/england-nhs-gp-reviews
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: open-government-license license_link: https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ language: - en tags: - reviews - medical - gp - nhs pretty_name: England NHS GP Reviews (2022 - 2024) --- # England NHS GP Reviews (2022 - 2024) <!-- Provide a quick summary of the dataset. --> England NHS GP Reviews (2022 - 2024) Scrapped from https://www.nhs.uk/service-search/find-a-gp ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> England NHS GP Reviews (2022-2024) Scraped from https://www.nhs.uk/service-search/find-a-gp This dataset contains reviews of GP surgeries across England scraped from the NHS website. Each GP surgery is identified by an ODS code and surgery name. The scraped data includes the first 7 pages of reviews for each surgery, capturing the following attributes: - Surgery ODE Code - Surgery name - Scrape URL - Review Title - Star Rating (1-5) - Review Comment Text - Date of Review (Month and Year) In total the dataset covers GP surgery reviews posted between 2022-2024. Over 61,000 individual reviews have been gathered providing insight into patient experiences with GP surgeries relating to aspects like appointments, staff, facilities and overall service. The data is intended to enable further analysis into the quality of GP surgeries based on patient reviews submitted to the official NHS platforms. It can facilitate identification of review trends and top performing GP surgeries based on review metrics like average ratings and most frequent positive/negative review topics. The reviews have been gathered through web scraping the NHS public websites, but are not officially endorsed NHS data products. The provided reviews should be considered assertions by individual anonymous patients regarding their experience with the listed GP surgery. Personal information has been removed during the scraping process to protect patient privacy. - **Curated by:** Jan du Plessis - **Language(s) (NLP):** English - **License:** UK open-government-license ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> ## Uses <!-- Address questions around how the dataset is intended to be used. --> The dataset is intended for use with BERTopic for topic analysis. ### Direct Use <!-- This section describes suitable use cases for the dataset. --> NLP ML + Deep Learining Projects ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> - Surgery ODE Code - Surgery name - Scrape URL - Review Title - Star Rating (1-5) - Review Comment Text - Date of Review (Month and Year) ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> Traininig a model to classify medical reviews. ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> https://www.nhs.uk/service-search/find-a-gp #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> Web Scrapping using BeautifulSoup. Duplicates has been removed and NAN dropped. #### Personal and Sensitive Information <!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. --> Reviews are anonymise with not patient identifyable information captured. ## Dataset Card Authors [optional] Jan du Plessis ## Dataset Card Contact drjanduplessis@icloud.com
提供机构:
janduplessis886
原始信息汇总

England NHS GP Reviews (2022 - 2024)

数据集描述

England NHS GP Reviews (2022-2024) 是从 NHS 网站上抓取的英格兰各地全科医生诊所的评论数据集。每个全科医生诊所由一个 ODS 代码和诊所名称标识。抓取的数据包括每个诊所的前 7 页评论,涵盖以下属性:

  • 诊所 ODE 代码
  • 诊所名称
  • 抓取 URL
  • 评论标题
  • 星级评分(1-5)
  • 评论内容文本
  • 评论日期(月份和年份)

该数据集总共包含 2022 年至 2024 年间发布的超过 61,000 条评论,提供了患者对全科医生诊所的体验洞察,涉及预约、员工、设施和整体服务等方面。

数据集旨在通过患者在官方 NHS 平台上提交的评论,进一步分析全科医生诊所的质量。它可以促进基于平均评分和最频繁的正面/负面评论主题等指标,识别评论趋势和表现最佳的全科医生诊所。

评论是通过抓取 NHS 公共网站收集的,但并非官方认可的 NHS 数据产品。提供的评论应被视为匿名患者对其所列全科医生诊所体验的个人陈述。在抓取过程中已删除个人信息,以保护患者隐私。

  • 整理者: Jan du Plessis
  • 语言(NLP): 英语
  • 许可证: UK open-government-license

数据集结构

  • 诊所 ODE 代码
  • 诊所名称
  • 抓取 URL
  • 评论标题
  • 星级评分(1-5)
  • 评论内容文本
  • 评论日期(月份和年份)

数据集创建

创建动机

训练一个用于分类医疗评论的模型。

源数据

https://www.nhs.uk/service-search/find-a-gp

数据收集和处理

使用 BeautifulSoup 进行网页抓取。已删除重复数据并丢弃空值。

个人和敏感信息

评论已匿名化,未捕获可识别患者身份的信息。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作