Synthetic nursing handover training and development data set - text files
收藏Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/synthetic-nursing-handover-text-files/820931
下载链接
链接失效反馈官方服务:
资源简介:
This is one of two collection records. Please see the link below for the other collection of associated audio files.\n\nBoth collections together comprise an open clinical dataset of three sets of 101 nursing handover records, very similar to real documents in Australian English. Each record consists of a patient profile, spoken free-form text document, written free-form text document, and written structured document.\n\nThis collection contains 3 sets of text documents.\n\nData Set 1 for Training and Development\n\nThe data set, released in June 2014, includes the following documents:\n\nFolder initialisation: Initialisation details for speech recognition using Dragon Medical 11.0 (i.e., i) DOCX for the written, free-form text document that originates from the Dragon software release and ii) WMA for the spoken, free-form text document by the RN)\nFolder 100profiles: 100 patient profiles (DOCX)\nFolder 101writtenfreetextreports: 101 written, free-form text documents (TXT)\nFolder 100x6speechrecognised: 100 speech-recognized, written, free-form text documents for six Dragon vocabularies (TXT)\nFolder 101informationextraction: 101 written, structured documents for information extraction that include i) the reference standard text, ii) features used by our best system, iii) form categories with respect to the reference standard and iv) form categories with respect to the our best information extraction system (TXT in CRF++ format).\n\nAn Independent Data Set 2\n\nThe aforementioned data set was supplemented in April 2015 with an independent set that was used as a test set in the CLEFeHealth 2015 Task 1a on clinical speech recognition and can be used as a validation set in the CLEFeHealth 2016 Task 1 on handover information extraction. Hence, when using this set, please avoid its repeated use in evaluation – we do not wish to overfit to these data sets.\n\nThe set released in April 2015 consists of 100 patient profiles (DOCX), 100 written, and 100 speech-recognized, written, free-form text documents for the Dragon vocabulary of Nursing (TXT). The set released in November 2015 consists of the respective 100 written free-form text documents (TXT) and 100 written, structured documents for information extraction.\n\nAn Independent Data Set 3\n\nFor evaluation purposes, the aforementioned data sets were supplemented in April 2016 with an independent set of another 100 synthetic cases. \n\nLineage: Data creation included the following steps: generation of patient profiles; creation of written, free form text documents; development of a structured handover form, using this form and the written, free-form text documents to create written, structured documents; creation of spoken, free-form text documents; using a speech recognition engine with different vocabularies to convert the spoken documents to written, free-form text; and using an information extraction system to fill out the handover form from the written, free-form text documents.\n\nSee Suominen et al (2015) in the links below for a detailed description and examples.
提供机构:
Commonwealth Scientific and Industrial Research Organisation



