资源简介:
阿里云、苏州大学联合推出了CFLUE(Chinese Financial Language Understanding Evaluation),这是一个新颖的、全面的评估基准,旨在评估大型语言模型在中文金融语境中的理解和处理能力。CFLUE通过两个主要维度-知识评估和应用评估来衡量语言模型的性能。
- 知识评估部分包含超过38,000个多项选择题,这些题目选自15种不同的金融资格模拟考试,旨在测试语言模型的答案预测和推理能力。每个问题都伴随有解释,有助于深入评价模型的推理过程。
- 应用评估部分则提供超过16,000个实例,覆盖文本分类、机器翻译、关系抽取、阅读理解和文本生成等五种经典NLP任务,这些实例源自现有共享任务或由专业人员标注的真实数据。
整体而言,CFLUE为了解和提升中文金融领域LLMs的能力提供了多角度的见解,并通过CFLUE呼吁对这些模型的能力进行更全面细致的评估。研究团队期望,CFLUE不仅能促进对现有模型的深入了解,还能推动中文金融领域语言模型发展的新步伐。目前,CFLUE V1.0 的评估数据集将向公众提供,未来计划不断更新版本并推出集成的平台化评估服务,旨在为整个行业提供全面的一站式评价解决方案。
Alibaba Cloud and Soochow University have jointly launched CFLUE (Chinese Financial Language Understanding Evaluation), a novel and comprehensive evaluation benchmark designed to assess the comprehension and processing capabilities of large language models in the context of Chinese finance. CFLUE measures the performance of language models through two main dimensions: knowledge assessment and application assessment.
- The knowledge assessment section includes over 38,000 multiple-choice questions selected from 15 different financial qualification mock exams, aimed at testing the answer prediction and reasoning abilities of language models. Each question is accompanied by an explanation, facilitating an in-depth evaluation of the model's reasoning process.
- The application assessment section provides over 16,000 instances covering five classic NLP tasks: text classification, machine translation, relation extraction, reading comprehension, and text generation. These instances are derived from existing shared tasks or real data annotated by professionals.
Overall, CFLUE offers multi-faceted insights into understanding and enhancing the capabilities of LLMs in the Chinese financial domain and calls for a more comprehensive and detailed evaluation of these models' capabilities through CFLUE. The research team hopes that CFLUE will not only promote a deeper understanding of existing models but also drive new strides in the development of language models in the Chinese financial field. Currently, the evaluation dataset of CFLUE V1.0 will be made available to the public, with plans to continuously update versions and launch integrated platform-based evaluation services in the future, aiming to provide a comprehensive one-stop evaluation solution for the entire industry.