Dataset for the article "Potential overinterpretation of results in the abstracts of machine learning studies for movie box office revenue prediction: a systematic review"
收藏Figshare2026-02-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_for_the_article_Potential_overinterpretation_of_results_in_the_abstracts_of_machine_learning_studies_for_movie_box_office_revenue_prediction_a_systematic_review_/30931817
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning (ML) has significantly advanced the prediction of movie box office revenue (MBOR). However, ML studies are subject to the risk of overinterpretation, defined as the misuse of language so that evaluation results appear overly positive. This phenomenon is particularly acute in the abstract, which serves as the primary tool for attracting readership, thereby creating the potential for overemphasis on the most favorable results. To assess the frequency and prevalence of overinterpretation in study abstracts, we conducted a prospectively registered systematic review of the MBOR literature. Three databases (Scopus, IEEE Digital Library, and ACM Digital Library) were searched for English-language peer-reviewed articles published until 2024. We evaluated 46 eligible articles for nine reporting practices adapted from a classification of overinterpretation in biomedical literature. The most prevalent practices in the abstract were the omission of performance metrics (87%) and inappropriate use of strong and leading words (44%). The absence of mean absolute percentage error (MAPE, 28%) and coefficient of determination (R2, 24%) in the abstract was also common, even though these metrics were provided in the main text. Contrary to expectation, no inappropriate extrapolation of evaluation results in abstracts was found. This review highlights the diversity of reporting practices in the abstracts of ML-based MBOR studies. We submit recommendations for enhanced reporting, which can promote the accurate interpretation of evaluation results and the synthesis of evidence from multiple studies.
创建时间:
2026-02-12



