Criteria-Based_Content_Analysis_of_Child_Sexual_Abuse_Statements
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/scsr4cd5h7
下载链接
链接失效反馈官方服务:
资源简介:
Purpose
This study investigates the inter-rater reliability between human experts (a forensic psychologist and a social worker) and an artificial intelligence (AI) model in the assessment of child sexual abuse statements. The research aims to explore the potential, limitations, and consistency of AI as an evaluation tool within the framework of Criteria-Based Content Analysis (CBCA), a widely used method for assessing statement credibility.
Materials and methods
Sixty-five anonymized transcripts of forensic interviews with child sexual abuse victims (N=65) were independently evaluated by three raters: a forensic psychologist, a social worker, and a large language model (ChatGPT, GPT-4o Plus). Each statement was coded using the 19-item CBCA framework. Inter-rater reliability was analyzed using Intraclass Correlation Coefficient (ICC), Cohen’s Kappa (κ), and other agreement statistics to compare the judgments between the human-human pairing and the human-AI pairings.
Results
A high degree of inter-rater reliability was found between the two human experts, with the majority of criteria showing 'good' to 'excellent' agreement (15 of 19 criteria with ICC > .75). In stark contrast, a dramatic and significant decrease in reliability was observed when the AI model's evaluations were compared with those of the human experts. The AI demonstrated systematic disagreement on criteria requiring nuanced, contextual judgment, with reliability coefficients frequently falling into 'poor' or negative ranges (e.g., ICC = -.106 for 'Logical structure'), indicating its evaluation logic fundamentally differs from human reasoning.
Discussion
The findings reveal a profound gap between the nuanced, contextual reasoning of human experts and the pattern-recognition capabilities of the current AI model. The study concludes that AI, in its present form, cannot reliably replicate human expert judgment in the complex task of credibility assessment. While AI is not a viable autonomous evaluator, it may hold limited potential as a 'cognitive assistant' to support human workflows. The assessment of child testimony credibility remains a task that deeply requires human judgment and is far beyond the current capabilities of artificial intelligence.
创建时间:
2025-06-13



