Dataset for Parkinson's Disease Screening, Error Analysis, Quality Assessment, and User Study Validation
收藏DataCite Commons2026-03-11 更新2026-05-07 收录
下载链接:
https://rochester.figshare.com/articles/dataset/Demographic_information_of_participants_their_PD_diagnosis_screening_labels_provided_by_clinicians_and_screening_predictions_made_by_PARK/29410703
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains structured datasets used in research on automated Parkinson's disease (PD) screening and evaluation of an AI-assisted assessment system. The data include participant-level metadata, model-prediction outcomes, uncertainty indicators, neurologist annotations, recording-quality measures, demographic summaries, clinical-stage labels, and validation survey responses. Together, these files support reproducible research on multimodal PD screening, dataset quality analysis, and human evaluation of the system.The repository is organized into two major parts: a <b>core screening dataset</b> and a <b>validation dataset</b> collected during user studies.<b>Core Screening Dataset</b>The core dataset contains participant-level metadata associated with multimodal task recordings used for Parkinson's disease screening. These recordings correspond to different behavioral modalities, such as facial expression, speech, and finger-tapping tasks, as well as their multimodal fusion.===============<br><b>test_data_big.csv</b><br>===============This is the primary dataset containing <b>320 samples</b> and participant-level metadata used in the screening experiments. Each row corresponds to one participant sample. The file includes demographic information, task or protocol metadata, associated recording filenames, model prediction outcomes, uncertainty-related variables, and expert labels.Key groups of variables include:<b>Participant and recording metadata</b><b>unique_row_id</b>: Unique identifier for each sample<b>participant_id</b>: Participant identifier used to group recordings belonging to the same individual<b>date</b>: Date of data collection<b>protocol</b>: Recording or assessment protocol used<b>test_split</b>: Dataset partition used for evaluation<b>gender</b>: Participant gender<b>age</b>: Participant age<b>age_group</b>: Age category<b>race</b>: Self-reported racial category<i>filename_ columns</i>*: Filenames corresponding to recordings for different behavioral tasks used in screening<b>Prediction-error indicators</b>These columns indicate whether the model prediction for a given modality or fusion output was incorrect relative to the reference label:<b>misclassified_smile</b>: Whether the smile-based model prediction was misclassified<b>misclassified_speech</b>: Whether the speech-based model prediction was misclassified<b>misclassified_finger</b>: Whether the finger-based model prediction was misclassified<b>misclassified_fusion</b>: Whether the multimodal fusion prediction was misclassifiedThese columns are useful for modality-specific error analysis and for studying whether some tasks are more difficult than others.<b>Uncertainty-related variables</b><b>uncertain_flag</b>: Indicator showing whether the model marked the sample as uncertain<b>pred_std_fusion</b>: Standard deviation or variability associated with the fusion prediction, used as a measure of predictive uncertaintyThese fields support experiments on confidence estimation, selective prediction, and uncertainty-aware screening.<b>Neurologist expert labels</b>The file also contains labels from multiple neurologists:<b>neurologist_label_ray</b><b>neurologist_label_ruth</b><b>neurologist_label_jamie</b>These columns provide expert annotations from individual neurologists. They can be used to study agreement between clinicians, compare model predictions with expert judgment, or analyze label variability in Parkinson's disease screening.Overall, this file supports multimodal classification, subgroup analysis, model-error analysis, and uncertainty-aware evaluation.<br><br>==========================<br><b>test_data_big_with_quality.csv</b><br>==========================This file extends the main screening dataset by adding <b>recording-quality annotations</b> across modalities. It contains the same participant-level metadata and prediction-related variables as test_data_big.csv, along with additional columns describing the quality of recordings used by the screening models.Additional quality-related columns include:<b>smile_quality</b>: Quality annotation for the smile or facial-expression recording<b>Audio_quality</b>: Quality annotation for the speech or audio recording<b>finger_quality</b>: Overall quality annotation for the finger-tapping recording<b>finger_left_quality</b>: Quality annotation specific to the left-hand finger task<b>finger_right_quality</b>: Quality annotation specific to the right-hand finger taskThese quality variables can be used to examine how recording quality affects model performance, whether low-quality inputs increase uncertainty or misclassification, and whether one modality is more sensitive to poor capture conditions than others. The presence of separate left- and right-hand quality annotations also enables more detailed analysis of hand-specific effects in motor assessment.This file is especially useful for research on robust multimodal learning, data filtering, and quality-aware model evaluation.===============<br><b>df_stage_data.csv</b><br>===============This file provides <b>clinical severity annotations</b> for a subset of participants using the <b>Hoehn and Yahr scale</b>, a standard clinical measure of Parkinson's disease progression.The file contains:<b>id</b>: Participant identifier<b>Hoehn and Yahr Stage Score</b>: Numeric severity score<b>Stage Label</b>: Categorized severity label derived from the scoreThese data support analysis of screening performance across disease stages and enable studies of model sensitivity to symptom severity.===================<br><b>demographic_table.csv</b><br>===================This file summarizes the <b>dataset's demographic composition</b>. It aggregates participant counts across demographic categories and compares participants <b>with Parkinson's disease</b> and <b>without </b><b>Parkinson's</b><b> disease</b>.The table includes:<b>Demographic Property</b>: Demographic category, such as age group or race<b>Attribute</b>: Specific value within that category<b>With PD</b>: Number of participants with Parkinson's disease<b>Without PD</b>: Number of participants without Parkinson's disease<b>Total</b>: Total number of participants in that subgroupThis table supports transparency in dataset composition and demographic analysis.<b>Validation Dataset</b>The <b>validation_data/</b> directory contains data collected during user studies evaluating the AI-assisted screening platform.===================<br><b>users_data.csv</b><br>===================Contains metadata for users who participated in the validation study, including identifiers and demographic information used to organize survey and evaluation records.===================<br><b>tasks_survey.csv</b><br>===================Contains responses related to task completion, usability, and participant experiences while interacting with system tasks.===================<br><b>resources_survey.csv</b><br>===================Contains responses about the usefulness and accessibility of supporting materials or resources provided during the evaluation.===================<br><b>feedback_survey.csv</b><br>===================Contains user feedback on the system's usability, clarity, and overall experience.===================<br><b>final_survey.csv</b><br>===================Contains final post-study responses capturing participants' overall impressions, perceived usefulness, and satisfaction with the platform.<b>Research Use</b>This repository supports research in:automated Parkinson's disease screeningmultimodal digital biomarker developmentuncertainty-aware clinical AIrecording-quality analysis in health dataclinician-model comparison and label agreementhuman-AI interaction and usability evaluation in healthcare systemsBy combining participant metadata, modality-specific prediction outcomes, uncertainty measures, expert neurologist labels, recording-quality annotations, and user-study responses, this dataset enables comprehensive analysis of both model behavior and user interaction in AI-assisted Parkinson's disease assessment.
提供机构:
University of Rochester
创建时间:
2025-07-08



