Dataset for Parkinson's Disease Screening, Error Analysis, Quality Assessment, and User Study Validation
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Demographic_information_of_participants_their_PD_diagnosis_screening_labels_provided_by_clinicians_and_screening_predictions_made_by_PARK/29410703
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains structured datasets used in research on automated Parkinson's disease (PD) screening and evaluation of an AI-assisted assessment system. The data include participant-level metadata, model-prediction outcomes, uncertainty indicators, neurologist annotations, recording-quality measures, demographic summaries, clinical-stage labels, and validation survey responses. Together, these files support reproducible research on multimodal PD screening, dataset quality analysis, and human evaluation of the system.
The repository is organized into two major parts: a core screening dataset and a validation dataset collected during user studies.
Core Screening DatasetThe core dataset contains participant-level metadata associated with multimodal task recordings used for Parkinson's disease screening. These recordings correspond to different behavioral modalities, such as facial expression, speech, and finger-tapping tasks, as well as their multimodal fusion.
===============
test_data_big.csv
===============This is the primary dataset containing 320 samples and participant-level metadata used in the screening experiments. Each row corresponds to one participant sample. The file includes demographic information, task or protocol metadata, associated recording filenames, model prediction outcomes, uncertainty-related variables, and expert labels.
Key groups of variables include:
Participant and recording metadata
unique_row_id: Unique identifier for each sampleparticipant_id: Participant identifier used to group recordings belonging to the same individualdate: Date of data collectionprotocol: Recording or assessment protocol usedtest_split: Dataset partition used for evaluationgender: Participant genderage: Participant ageage_group: Age categoryrace: Self-reported racial categoryfilename_ columns*: Filenames corresponding to recordings for different behavioral tasks used in screeningPrediction-error indicators
These columns indicate whether the model prediction for a given modality or fusion output was incorrect relative to the reference label:
misclassified_smile: Whether the smile-based model prediction was misclassifiedmisclassified_speech: Whether the speech-based model prediction was misclassifiedmisclassified_finger: Whether the finger-based model prediction was misclassifiedmisclassified_fusion: Whether the multimodal fusion prediction was misclassifiedThese columns are useful for modality-specific error analysis and for studying whether some tasks are more difficult than others.
Uncertainty-related variables
uncertain_flag: Indicator showing whether the model marked the sample as uncertainpred_std_fusion: Standard deviation or variability associated with the fusion prediction, used as a measure of predictive uncertaintyThese fields support experiments on confidence estimation, selective prediction, and uncertainty-aware screening.
Neurologist expert labels
The file also contains labels from multiple neurologists:
neurologist_label_rayneurologist_label_ruthneurologist_label_jamieThese columns provide expert annotations from individual neurologists. They can be used to study agreement between clinicians, compare model predictions with expert judgment, or analyze label variability in Parkinson's disease screening.
Overall, this file supports multimodal classification, subgroup analysis, model-error analysis, and uncertainty-aware evaluation.
==========================
test_data_big_with_quality.csv
==========================
This file extends the main screening dataset by adding recording-quality annotations across modalities. It contains the same participant-level metadata and prediction-related variables as test_data_big.csv, along with additional columns describing the quality of recordings used by the screening models.
Additional quality-related columns include:
smile_quality: Quality annotation for the smile or facial-expression recordingAudio_quality: Quality annotation for the speech or audio recordingfinger_quality: Overall quality annotation for the finger-tapping recordingfinger_left_quality: Quality annotation specific to the left-hand finger taskfinger_right_quality: Quality annotation specific to the right-hand finger taskThese quality variables can be used to examine how recording quality affects model performance, whether low-quality inputs increase uncertainty or misclassification, and whether one modality is more sensitive to poor capture conditions than others. The presence of separate left- and right-hand quality annotations also enables more detailed analysis of hand-specific effects in motor assessment.
This file is especially useful for research on robust multimodal learning, data filtering, and quality-aware model evaluation.
===============
df_stage_data.csv
===============
This file provides clinical severity annotations for a subset of participants using the Hoehn and Yahr scale, a standard clinical measure of Parkinson's disease progression.
The file contains:
id: Participant identifierHoehn and Yahr Stage Score: Numeric severity scoreStage Label: Categorized severity label derived from the scoreThese data support analysis of screening performance across disease stages and enable studies of model sensitivity to symptom severity.
===================
demographic_table.csv
===================
This file summarizes the dataset's demographic composition. It aggregates participant counts across demographic categories and compares participants with Parkinson's disease and without Parkinson's disease.
The table includes:
Demographic Property: Demographic category, such as age group or raceAttribute: Specific value within that categoryWith PD: Number of participants with Parkinson's diseaseWithout PD: Number of participants without Parkinson's diseaseTotal: Total number of participants in that subgroupThis table supports transparency in dataset composition and demographic analysis.
Validation DatasetThe validation_data/ directory contains data collected during user studies evaluating the AI-assisted screening platform.
===================
users_data.csv
===================
Contains metadata for users who participated in the validation study, including identifiers and demographic information used to organize survey and evaluation records.
===================
tasks_survey.csv
===================
Contains responses related to task completion, usability, and participant experiences while interacting with system tasks.
===================
resources_survey.csv
===================
Contains responses about the usefulness and accessibility of supporting materials or resources provided during the evaluation.
===================
feedback_survey.csv
===================
Contains user feedback on the system's usability, clarity, and overall experience.
===================
final_survey.csv
===================
Contains final post-study responses capturing participants' overall impressions, perceived usefulness, and satisfaction with the platform.
Research UseThis repository supports research in:
automated Parkinson's disease screeningmultimodal digital biomarker developmentuncertainty-aware clinical AIrecording-quality analysis in health dataclinician-model comparison and label agreementhuman-AI interaction and usability evaluation in healthcare systemsBy combining participant metadata, modality-specific prediction outcomes, uncertainty measures, expert neurologist labels, recording-quality annotations, and user-study responses, this dataset enables comprehensive analysis of both model behavior and user interaction in AI-assisted Parkinson's disease assessment.
创建时间:
2026-03-11



