KonradSzafer/lm-eval-results-demo

Name: KonradSzafer/lm-eval-results-demo
Creator: KonradSzafer
Published: 2024-05-31 06:06:37
License: 暂无描述

Hugging Face2024-05-31 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/KonradSzafer/lm-eval-results-demo

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Evaluation run of microsoft/phi-2 dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)\nThe dataset is composed\ \ of 2 configuration(s), each one corresponding to one of the evaluated task.\n\n\ The dataset has been created from 2 run(s). Each run can be found as a specific\ \ split in each configuration, the split being named using the timestamp of the\ \ run.The \"train\" split is always pointing to the latest results.\n\nAn additional\ \ configuration \"results\" store all the aggregated results of the run.\n\nTo load\ \ the details from a run, you can for instance do the following:\n```python\nfrom\ \ datasets import load_dataset\ndata = load_dataset(\n\t\"KonradSzafer/lm-eval-results-private\"\ ,\n\tname=\"microsoft__phi-2__arc_easy\",\n\tsplit=\"latest\"\n)\n```\n\n## Latest\ \ results\n\nThese are the [latest results from run 2024-05-31T06-06-35.913097](https://huggingface.co/datasets/KonradSzafer/lm-eval-results-private/blob/main/microsoft__phi-2/results_2024-05-31T06-06-35.913097.json)\ \ (note that there might be results for other tasks in the repos if successive evals\ \ didn't cover the same tasks. You find each in the results and the \"latest\" split\ \ for each eval):\n\n```python\n{\n \"all\": {\n \"gsm8k\": {\n \ \ \"exact_match,strict-match\": 0.45,\n \"exact_match_stderr,strict-match\"\ : 0.04999999999999999,\n \"exact_match,flexible-extract\": 0.47,\n \ \ \"exact_match_stderr,flexible-extract\": 0.05016135580465919,\n \ \ \"alias\": \"gsm8k\"\n },\n \"arc_easy\": {\n \"\ acc,none\": 0.82,\n \"acc_stderr,none\": 0.03861229196653696,\n \ \ \"acc_norm,none\": 0.83,\n \"acc_norm_stderr,none\": 0.03775251680686371,\n\ \ \"alias\": \"arc_easy\"\n }\n },\n \"gsm8k\": {\n \ \ \"exact_match,strict-match\": 0.45,\n \"exact_match_stderr,strict-match\"\ : 0.04999999999999999,\n \"exact_match,flexible-extract\": 0.47,\n \ \ \"exact_match_stderr,flexible-extract\": 0.05016135580465919,\n \"alias\"\ : \"gsm8k\"\n },\n \"arc_easy\": {\n \"acc,none\": 0.82,\n \"\ acc_stderr,none\": 0.03861229196653696,\n \"acc_norm,none\": 0.83,\n \ \ \"acc_norm_stderr,none\": 0.03775251680686371,\n \"alias\": \"arc_easy\"\ \n }\n}\n```" repo_url: https://huggingface.co/microsoft/phi-2 leaderboard_url: '' point_of_contact: '' configs: - config_name: microsoft__phi-2__arc_easy data_files: - split: 2024_05_30T21_07_22.554816 path: - '**/samples_arc_easy_2024-05-30T21-07-22.554816.json' - split: 2024_05_31T06_06_35.913097 path: - '**/samples_arc_easy_2024-05-31T06-06-35.913097.json' - split: latest path: - '**/samples_arc_easy_2024-05-31T06-06-35.913097.json' - config_name: microsoft__phi-2__gsm8k data_files: - split: 2024_05_30T21_07_22.554816 path: - '**/samples_gsm8k_2024-05-30T21-07-22.554816.json' - split: 2024_05_31T06_06_35.913097 path: - '**/samples_gsm8k_2024-05-31T06-06-35.913097.json' - split: latest path: - '**/samples_gsm8k_2024-05-31T06-06-35.913097.json' - config_name: microsoft__phi-2__results data_files: - split: 2024_05_30T21_07_22.554816 path: - '**/results_2024-05-30T21-07-22.554816.json' - split: 2024_05_31T06_06_35.913097 path: - '**/results_2024-05-31T06-06-35.913097.json' - split: latest path: - '**/results_2024-05-31T06-06-35.913097.json' --- # Dataset Card for Evaluation run of microsoft/phi-2  Dataset automatically created during the evaluation run of model [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) The dataset is composed of 2 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 2 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run. To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset( "KonradSzafer/lm-eval-results-private", name="microsoft__phi-2__arc_easy", split="latest" ) ``` ## Latest results These are the [latest results from run 2024-05-31T06-06-35.913097](https://huggingface.co/datasets/KonradSzafer/lm-eval-results-private/blob/main/microsoft__phi-2/results_2024-05-31T06-06-35.913097.json) (note that there might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "gsm8k": { "exact_match,strict-match": 0.45, "exact_match_stderr,strict-match": 0.04999999999999999, "exact_match,flexible-extract": 0.47, "exact_match_stderr,flexible-extract": 0.05016135580465919, "alias": "gsm8k" }, "arc_easy": { "acc,none": 0.82, "acc_stderr,none": 0.03861229196653696, "acc_norm,none": 0.83, "acc_norm_stderr,none": 0.03775251680686371, "alias": "arc_easy" } }, "gsm8k": { "exact_match,strict-match": 0.45, "exact_match_stderr,strict-match": 0.04999999999999999, "exact_match,flexible-extract": 0.47, "exact_match_stderr,flexible-extract": 0.05016135580465919, "alias": "gsm8k" }, "arc_easy": { "acc,none": 0.82, "acc_stderr,none": 0.03861229196653696, "acc_norm,none": 0.83, "acc_norm_stderr,none": 0.03775251680686371, "alias": "arc_easy" } } ``` ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

KonradSzafer

原始信息汇总

数据集概述

数据集名称

Evaluation run of microsoft/phi-2

数据集创建

创建目的: 自动生成于模型microsoft/phi-2的评估运行过程中。
创建过程: 由2次运行创建，每次运行对应一个特定的分割，分割名称基于运行的时间戳。

数据集结构

配置数量: 2个
配置详情:
- microsoft__phi-2__arc_easy: 包含3个分割，分别对应不同的时间戳和最新结果。
- microsoft__phi-2__gsm8k: 包含3个分割，分别对应不同的时间戳和最新结果。
- microsoft__phi-2__results: 存储所有运行的聚合结果，包含3个分割，分别对应不同的时间戳和最新结果。

数据集加载示例

python from datasets import load_dataset data = load_dataset( "KonradSzafer/lm-eval-results-private", name="microsoft__phi-2__arc_easy", split="latest" )