five

fhirfly/medicalquestions

收藏
Hugging Face2023-10-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fhirfly/medicalquestions
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification language: - en tags: - medical pretty_name: FhirFly Medical Questions size_categories: - 10K<n<100K --- # 🤗 Dataset Card: fhirfly/medicalquestions ## Dataset Overview - Dataset name: fhirfly/medicalquestions - Dataset size: 25,102 questions - Labels: 1 (medical), 0 (non-medical) - Distribution: Evenly distributed between medical and non-medical questions ## Dataset Description The fhirfly/medicalquestions dataset is a collection of 25,102 questions labeled as either medical or non-medical. The dataset aims to provide a diverse range of questions covering various medical and non-medical domains. The questions in the dataset have been manually labeled by domain experts based on the context and content of each question. Each question is assigned a label of 1 if it is determined to be a medical question and a label of 0 if it is classified as a non-medical question. ## Dataset Structure The dataset consists of a single file containing the following columns: - **Text**: The text of the question. - **Label**: The label assigned to each question, either 1 (medical) or 0 (non-medical). The questions are evenly distributed between medical and non-medical categories, ensuring a balanced dataset for training and evaluation. ## Potential Biases Efforts have been made to ensure that the dataset is representative of various medical and non-medical topics. However, it is important to acknowledge that biases may exist in the dataset due to the subjective nature of labeling questions. Biases could be present in terms of the types of questions included, the representation of certain medical conditions or non-medical topics, or the labeling process itself. It is recommended to perform thorough evaluation and analysis of the dataset to identify and mitigate potential biases during model training and deployment. Care should be taken to address any biases to ensure fair and unbiased predictions. ## Dataset Quality The fhirfly/medicalquestions dataset has undergone manual labeling by domain experts, which helps maintain a high level of quality and accuracy. However, human labeling is not entirely immune to errors or subjectivity. To ensure the quality of the dataset, a thorough review process has been conducted to minimize errors and maintain consistency in labeling. Nonetheless, it is advisable to validate and verify the data as part of your specific use case to ensure it meets your requirements. ## Data License The fhirfly/medicalquestions dataset is released under the MIT license. Please refer to the license file accompanying the dataset for more information on its usage and any restrictions that may apply. ## Dataset Citation If you use the fhirfly/medicalquestions dataset in your work, please cite it as: ``` @dataset{fhirfly/medicalquestions, title = {fhirfly/medicalquestions}, author = {fhirfly}, year = {2023}, publisher = {Hugging Face}, version = {1.0.0}, url = {https://huggingface.co/datasets/fhirfly/medicalquestions} } ```
提供机构:
fhirfly
原始信息汇总

数据集概述

  • 数据集名称: fhirfly/medicalquestions
  • 数据集大小: 25,102个问题
  • 标签: 1(医学),0(非医学)
  • 分布: 医学和非医学问题均匀分布

数据集描述

fhirfly/medicalquestions数据集是一个包含25,102个问题的集合,这些问题被标记为医学或非医学。该数据集旨在提供涵盖各种医学和非医学领域的多样化问题。

数据集中的问题由领域专家根据每个问题的上下文和内容手动标记。每个问题被分配一个标签,如果被确定为医学问题,则标签为1;如果被分类为非医学问题,则标签为0。

数据集结构

数据集由一个包含以下列的文件组成:

  • Text: 问题的文本。
  • Label: 每个问题分配的标签,1(医学)或0(非医学)。

问题在医学和非医学类别之间均匀分布,确保了用于训练和评估的平衡数据集。

潜在偏差

已努力确保数据集代表各种医学和非医学主题。然而,重要的是要承认由于标记问题的主观性质,数据集中可能存在偏差。偏差可能存在于所包含的问题类型、某些医学状况或非医学主题的表示,或标记过程本身。

建议对数据集进行彻底的评估和分析,以在模型训练和部署期间识别和缓解潜在偏差。应谨慎处理任何偏差,以确保公平和无偏差的预测。

数据集质量

fhirfly/medicalquestions数据集已由领域专家进行手动标记,这有助于保持高质量和准确性。然而,人类标记并非完全不受错误或主观性的影响。

为了确保数据集的质量,已进行了彻底的审查过程,以最小化错误并保持标记的一致性。尽管如此,建议作为您特定用例的一部分验证和核实数据,以确保其满足您的需求。

数据许可证

fhirfly/medicalquestions数据集在MIT许可证下发布。有关其使用和任何可能适用的限制的更多信息,请参阅随数据集提供的许可证文件。

数据集引用

如果您在工作中使用fhirfly/medicalquestions数据集,请按以下方式引用:

@dataset{fhirfly/medicalquestions, title = {fhirfly/medicalquestions}, author = {fhirfly}, year = {2023}, publisher = {Hugging Face}, version = {1.0.0}, url = {https://huggingface.co/datasets/fhirfly/medicalquestions} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作