five

A Human-Rated Hospitality Review Benchmark for LLM- Generated Sentiment Quadruple Extraction

收藏
DataONE2026-05-06 更新2026-05-19 收录
下载链接:
https://search.dataone.org/view/sha256:08f3569c12fbd4e6a5af8a9900e5400f09f81b749e15f2a1eaa51ec04f8c8734
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset provides a human-rated benchmark for evaluating LLM-generated sentiment quadruples in hospitality reviews. It contains 40 hospitality reviews, 109 predicted–reference quadruple comparison pairs, gold reference annotations, LLM-generated outputs, exact-match F1 scores, Semantic-Aware Flexible Evaluation (SAFE) scores, and human ratings across three dimensions: output acceptability, semantic similarity to reference annotations, and perceived alignment with metric scoring behaviour. The benchmark supports evaluation of LLM-based Quad-ABSA systems, comparison of automatic evaluation metrics, analysis of exact-match versus semantic-aware scoring, and development of new metric baselines for structured sentiment analysis. Although originally constructed for validating SAFE, the dataset can be used independently to evaluate other LLM outputs and study cases where automatic metric scores diverge from human judgement.
创建时间:
2026-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作