Validation test results on the imdb corpus.
收藏Figshare2025-01-10 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Validation_test_results_on_the_imdb_corpus_/28186299
下载链接
链接失效反馈官方服务:
资源简介:
Assessing whether texts are positive or negative—sentiment analysis—has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment. We demonstrate the strong performance of lexica using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post-9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage. The code to apply MultiLexScaled is available online.
情感分析(sentiment analysis)——即判别文本情感极性为积极或消极的任务——在诸多学科中拥有广泛的应用场景。自动化分析方法使得能够快速、可复现且高精度地对海量文本进行情感编码标注成为可能。相较于机器学习与大语言模型(LLM)方法,基于词典的情感分析方法(lexicon-based methods)可能会在性能表现上有所妥协,但与之相对的是,其具备更强的泛化能力与领域无关性,且尤为关键的是,能够识别情感的细粒度分级。我们借助MultiLexScaled方法验证了各类情感词典的优异性能:该方法对多款主流通用情感词典的情感分值进行平均计算。我们基于多个不同领域的基准数据集对该方法进行验证,并将其性能与机器学习及大语言模型的替代方案进行对比。此外,我们通过对9·11事件前后英国媒体对穆斯林群体的报道展开分析,展示了细粒度情感等级识别的价值:研究发现,采用二值化情感分值指标时,会得出关于9·11事件冲击本质以及严肃大报与小报报道差异的不同(且错误的)结论。MultiLexScaled方法的应用代码已在线公开。
创建时间:
2025-01-10



