five

relevanthint/scnclab2023

收藏
Hugging Face2023-01-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/relevanthint/scnclab2023
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language: - en language_creators: - machine-generated license: [] multilinguality: - monolingual paperswithcode_id: scnclab2023 pretty_name: Synthetical Clinical Notes - Clab 2023 dataset_info: features: - name: tokens sequence: string - name: ner_tags sequence: class_label: names: '0' : O '1' : B-allergies '2' : I-allergies '3' : B-biomarkers '4' : I-biomarkers '5' : B-cancer_symptoms '6' : I-cancer_symptoms '7' : B-cancer_type '8' : I-cancer_type '9' : B-date '10' : I-date '11' : B-diagnosis '12' : I-diagnosis '13' : B-gender '14' : I-gender '15' : B-imaging_options '16' : I-imaging_options '17' : B-test_result '18' : I-test_result '19' : B-treatment '20' : I-treatment size_categories: - n<1K source_datasets: - original tags: - bio - clinic - cancer task_categories: - token-classification task_ids: - named-entity-recognition --- # Dataset Card for [scnclab2023] ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** relevanthint@gmail.com ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale The Dataset has been created using the GPT-3 API by providing a prompt with some manually created clinical notes. #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process The annotation has been done using [Argilla](https://github.com/argilla-io) #### Who are the annotators? The sinthetical clinical notes have been annotated by a group of three biomedical experts ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset Note that this is not a real dataset. ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset.
提供机构:
relevanthint
原始信息汇总

数据集概述

  • 名称: Synthetical Clinical Notes - Clab 2023
  • ID: scnclab2023
  • 语言: 英语 (en)
  • 语言生成方式: 机器生成
  • 多语言性: 单语种
  • 任务类别: 词元分类
  • 任务ID: 命名实体识别
  • 标签创建者: 专家生成
  • 数据集大小: 小于1000条记录
  • 源数据集: 原始数据
  • 标签: 生物、临床、癌症

数据集结构

数据字段

  • tokens: 字符串序列
  • ner_tags: 标签序列,包含以下类别:
    • O
    • B-allergies
    • I-allergies
    • B-biomarkers
    • I-biomarkers
    • B-cancer_symptoms
    • I-cancer_symptoms
    • B-cancer_type
    • I-cancer_type
    • B-date
    • I-date
    • B-diagnosis
    • I-diagnosis
    • B-gender
    • I-gender
    • B-imaging_options
    • I-imaging_options
    • B-test_result
    • I-test_result
    • B-treatment
    • I-treatment

数据集创建

注释过程

  • 注释工具: Argilla
  • 注释者: 一组三名生物医学专家

数据生成

  • 生成方式: 使用GPT-3 API,通过提供手动创建的临床笔记作为提示

注意事项

  • 数据集真实性: 注意,这不是一个真实的数据集。
二维码
社区交流群
二维码
科研交流群
商业服务