llm-agents/CriticBench

Name: llm-agents/CriticBench
Creator: llm-agents
Published: 2024-02-23 12:57:09
License: 暂无描述

Hugging Face2024-02-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/llm-agents/CriticBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - question-answering - text-classification - text-generation language: - en paperswithcode_id: criticbench pretty_name: CriticBench size_categories: - 1K<n<10K tags: - llm - reasoning - critique - correction - discrimination - math-word-problems - math-reasoning - question-answering - commonsense-reasoning - code-generation - symbolic-reasoning - algorithmic-reasoning --- # Dataset Card for Dataset Name  CriticBench is a comprehensive benchmark designed to assess LLMs' abilities to generate, critique/discriminate and correct reasoning across a variety of tasks. CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. It compiles 15 datasets and incorporates responses from three LLM families. ## Dataset Details ### Dataset Description  - **Curated by:** THU - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** EN - **License:** MIT ### Dataset Sources [optional]  - **Repository:** https://github.com/CriticBench/CriticBench - **Paper [optional]:** https://arxiv.org/pdf/2402.14809.pdf - **Demo [optional]:** https://criticbench.github.io/ ## Uses  ### Direct Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** ``` @misc{lin2024criticbench, title={CriticBench: Benchmarking LLMs for Critique-Correct Reasoning}, author={Zicheng Lin and Zhibin Gou and Tian Liang and Ruilin Luo and Haowei Liu and Yujiu Yang}, year={2024}, eprint={2402.14809}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` **APA:** Lin, Z., Gou, Z., Liang, T., Luo, R., Liu, H., & Yang, Y. (2024). CriticBench: Benchmarking LLMs for Critique-Correct Reasoning. ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

llm-agents

原始信息汇总

数据集卡片 for CriticBench

数据集概述

CriticBench 是一个综合基准，旨在评估大型语言模型（LLMs）在生成、批判/区分和纠正推理方面的能力，涵盖五个推理领域：数学、常识、符号、编码和算法。它整合了15个数据集，并包含了来自三个LLM家族的响应。

数据集详情

数据集描述

由：THU 策划
语言：英语（EN）
许可证：MIT

数据集来源

仓库：https://github.com/CriticBench/CriticBench
论文：https://arxiv.org/pdf/2402.14809.pdf
演示：https://criticbench.github.io/

使用

直接使用

[更多信息待补充]

数据集结构

[更多信息待补充]

数据集创建

策划理由

[更多信息待补充]

源数据

数据收集和处理

[更多信息待补充]

源数据生产者

[更多信息待补充]

标注

标注过程

[更多信息待补充]

标注者

[更多信息待补充]

个人和敏感信息

[更多信息待补充]

偏差、风险和局限性

[更多信息待补充]

建议

用户应意识到数据集的风险、偏差和局限性。更多信息待补充以供进一步建议。

引用

BibTeX:

@misc{lin2024criticbench, title={CriticBench: Benchmarking LLMs for Critique-Correct Reasoning}, author={Zicheng Lin and Zhibin Gou and Tian Liang and Ruilin Luo and Haowei Liu and Yujiu Yang}, year={2024}, eprint={2402.14809}, archivePrefix={arXiv}, primaryClass={cs.CL} }

APA:

Lin, Z., Gou, Z., Liang, T., Luo, R., Liu, H., & Yang, Y. (2024). CriticBench: Benchmarking LLMs for Critique-Correct Reasoning.

5,000+

优质数据集

54 个

任务类型

进入经典数据集