LEXBENCH
收藏arXiv2024-05-05 更新2024-06-21 收录
下载链接:
https://github.com/jacklanda/LexBench
下载链接
链接失效反馈官方服务:
资源简介:
LEXBENCH是由北京科技大学计算机与通信工程学院和外国语学院的研究人员开发的一个全面评估套件,旨在测试语言模型在十种语义短语处理任务上的表现。该数据集是首个从比较角度提出框架的工作,用于模型通用语义短语(即词汇搭配)以及三种细粒度语义短语,包括成语表达、名词复合词和动词结构。LEXBENCH通过实验验证了规模法则,发现大型模型在大多数任务上优于小型模型,并通过人类评估发现,强模型的性能在语义短语处理方面可与人类水平相媲美。数据集的应用领域旨在提高语言模型在语义短语理解方面的通用能力,为未来的研究提供服务。
LEXBENCH is a comprehensive evaluation suite developed by researchers from the School of Computer and Communication Engineering and the School of Foreign Studies of the University of Science and Technology Beijing, aiming to test the performance of language models across ten semantic phrase processing tasks. LEXBENCH is the first work to propose a comparative framework for evaluating both general semantic phrases (i.e., lexical collocations) and three fine-grained semantic phrase categories: idiomatic expressions, noun compounds, and verbal constructions. It has empirically validated the scaling law, finding that large-scale language models outperform smaller models on most tasks. Through human evaluation, it is confirmed that the performance of state-of-the-art language models in semantic phrase processing is comparable to human-level performance. This evaluation suite is designed to enhance the general semantic phrase comprehension capabilities of language models, providing a reliable benchmark for future research.
提供机构:
北京科技大学
创建时间:
2024-05-05
搜集汇总
背景与挑战
背景概述
LEXBENCH是一个全面评估语言模型语义短语处理能力的基准数据集,涵盖十种任务,包括词汇搭配和三种细粒度语义短语(成语、名词复合词、动词结构)。它通过实验验证了模型规模与性能的正相关关系,并发现强模型在语义处理上可达人类水平,旨在推动语言模型语义理解能力的研究。
以上内容由遇见数据集搜集并总结生成



