bermaneh/pde-llm-eval-results-v2

Name: bermaneh/pde-llm-eval-results-v2
Creator: bermaneh
Published: 2026-04-27 02:34:29
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/bermaneh/pde-llm-eval-results-v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - pde-llm-eval - free-gen - v3 --- # pde-llm-eval-results-v2 Free-gen PDE eval: 10 models, v3 dataset (128 rows, 8 conditions). 9 existing models have 144 rows (v2+v3), DeepSeek-R1-Distill-Qwen-32B has 128 rows (v3 only). ## Dataset Info - **Rows**: 1424 - **Columns**: 21 ## Columns | Column | Type | Description | |--------|------|-------------| | title | Value('string') | *No description provided* | | pde_class | Value('string') | *No description provided* | | mod_type | Value('string') | *No description provided* | | gt_pde | Value('string') | *No description provided* | | gt_method | Value('string') | *No description provided* | | gt_behavior | Value('string') | *No description provided* | | gt_valid | Value('bool') | *No description provided* | | model_response | Value('string') | Full model output (never truncated) | | parsed_pde | Value('string') | Extracted PDE type from response | | parsed_method | Value('string') | Extracted numerical method(s) | | parsed_behavior | Value('string') | Extracted physical process(es) | | parsed_valid | Value('string') | Extracted validity answer | | finish_reason | Value('string') | vLLM stop reason (stop/length) | | model | Value('string') | *No description provided* | | pde_match | Value('int64') | Binary keyword match for PDE type | | pde_embed_sim | Value('float64') | *No description provided* | | method_any_match | Value('int64') | 1 if any GT method token found in response | | method_recall | Value('float64') | Fraction of GT method tokens found | | behavior_any_match | Value('int64') | 1 if any GT behavior token found | | behavior_recall | Value('float64') | Fraction of GT behavior tokens found | | valid_match | Value('int64') | Binary match for validity field | ## Generation Parameters ```json { "script_name": "run_eval.py", "model": "multi (10 models)", "description": "Free-gen PDE eval: 10 models, v3 dataset (128 rows, 8 conditions). 9 existing models have 144 rows (v2+v3), DeepSeek-R1-Distill-Qwen-32B has 128 rows (v3 only).", "experiment_name": "pde-llm-eval", "job_id": "torch:7248133", "cluster": "torch", "artifact_status": "final", "canary": false, "hyperparameters": {}, "input_datasets": [] } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("bermaneh/pde-llm-eval-results-v2", split="train") print(f"Loaded {len(dataset)} rows") ``` ---

提供机构：

bermaneh

5,000+

优质数据集

54 个

任务类型

进入经典数据集