five

bio-nlp-umass/Synth-SBDH

收藏
Hugging Face2024-06-08 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/bio-nlp-umass/Synth-SBDH
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification - token-classification language: - en tags: - me - croissant pretty_name: Synth-SBDH size_categories: - 1K<n<10K --- # Dataset Card for Synth-SBDH Synth-SBDH is a collection of 8,767 synthetic examples with annotations for 15 SBDH categories. SBDH annotations include information such as presence, period and annotation rationale. ## Dataset Description Synth-SBDH is a novel synthetic SBDH dataset that mimics EHR notes. - **Repository:** [Codes to reproduce experiments](https://github.com/avipartho/Synth-SBDH) - **Paper:** [More Information Needed] - **Point of Contact:** [Avijit Mitra](mailto:avijitmitra@umass.edu) ## Dataset Structure ### Data Instances Some examples from [synth_sbdh_train.csv](synth_sbdh_train.csv) looks as follows. ``` { 'ex_no': 30, 'Text': 'Patient has lost his job due to physical disabilities and is currently living on government financial aid.', 'Textspan': 'lost his job || living on government financial aid', 'SBDH': 'job insecurity || financial insecurity', 'Presence': 'yes || yes', 'Period': 'current || current', 'Reasoning': 'The patient lost his job due to physical issues and this refers to job insecurity. || Reliance on government financial aid signifies financial insecurity.' } { 'ex_no': 31, 'Text': 'Patient was assaulted last year and suffers from PTSD.', 'Textspan': 'assaulted || suffers from PTSD', 'SBDH': 'violence || psychiatric symptoms or disorders', 'Presence': 'yes || yes', 'Period': 'history || current', 'Reasoning': 'Being assaulted is a form of violence. || PTSD is a psychiatric disorder.' } ``` ### Data Fields We release Synth-SBDH as CSV files. Each CSV file has the following fields: - `ex_no`: Unique identifier for an example. - `Text`: Example text sequence, at max a few sentences long. - `Textspan`: Text spans with mentions of SBDH, separated by '||'. - `Reasoning`: Rationales for SBDH annotations, separated by '||'. - `SBDH`: SBDH annotations for text spans in `Textspan`, separated by '||'. - `Presence`: Presence annotations (yes/no) for text spans in `Textspan`, separated by '||'. - `Period`: Period annotations (current/history) for text spans in `Textspan`, separated by '||'. - `Operation` (only for [synth_sbdh_test_reviewed.csv](synth_sbdh_test_reviewed.csv) file): One of the four review operation considered by the human experts - *keep*, *correct*, *discard* or *add*. ### Data Splits The Synth-SBDH dataset has 4 splits: _train_, _val_, _test_, and _test_reviewed_. Below are the statistics for the dataset. | Dataset Split | Number of Examples | Number of Annotations | | ------------- | ------------------ | --------------------- | | Train | 6,136 | 10,022 | | Val | 876 | 1,443 | | Test | 1,755 | 2,904 | | Test (Expert Reviewed) | 1,732 | 3,345 | ## Dataset Creation Details about how the data was created are available in our paper. ## Uses You may directly download and use the dataset using the `datasets` library. ```python from datasets import load_dataset synth_sbdh_dataset = load_dataset("bio-nlp-umass/Synth-SBDH") ``` Or you can also individually download the files and load them using any compatible libray. For example, using `pandas` - ```python import pandas as pd synth_sbdh_df = pd.read_csv('FILE_NAME') ``` ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
提供机构:
bio-nlp-umass
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作