five

bermaneh/pde-llm-eval-results-v2

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/pde-llm-eval-results-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - pde-llm-eval - free-gen - v3 --- # pde-llm-eval-results-v2 Free-gen PDE eval: 10 models, v3 dataset (128 rows, 8 conditions). 9 existing models have 144 rows (v2+v3), DeepSeek-R1-Distill-Qwen-32B has 128 rows (v3 only). ## Dataset Info - **Rows**: 1424 - **Columns**: 21 ## Columns | Column | Type | Description | |--------|------|-------------| | title | Value('string') | *No description provided* | | pde_class | Value('string') | *No description provided* | | mod_type | Value('string') | *No description provided* | | gt_pde | Value('string') | *No description provided* | | gt_method | Value('string') | *No description provided* | | gt_behavior | Value('string') | *No description provided* | | gt_valid | Value('bool') | *No description provided* | | model_response | Value('string') | Full model output (never truncated) | | parsed_pde | Value('string') | Extracted PDE type from response | | parsed_method | Value('string') | Extracted numerical method(s) | | parsed_behavior | Value('string') | Extracted physical process(es) | | parsed_valid | Value('string') | Extracted validity answer | | finish_reason | Value('string') | vLLM stop reason (stop/length) | | model | Value('string') | *No description provided* | | pde_match | Value('int64') | Binary keyword match for PDE type | | pde_embed_sim | Value('float64') | *No description provided* | | method_any_match | Value('int64') | 1 if any GT method token found in response | | method_recall | Value('float64') | Fraction of GT method tokens found | | behavior_any_match | Value('int64') | 1 if any GT behavior token found | | behavior_recall | Value('float64') | Fraction of GT behavior tokens found | | valid_match | Value('int64') | Binary match for validity field | ## Generation Parameters ```json { "script_name": "run_eval.py", "model": "multi (10 models)", "description": "Free-gen PDE eval: 10 models, v3 dataset (128 rows, 8 conditions). 9 existing models have 144 rows (v2+v3), DeepSeek-R1-Distill-Qwen-32B has 128 rows (v3 only).", "experiment_name": "pde-llm-eval", "job_id": "torch:7248133", "cluster": "torch", "artifact_status": "final", "canary": false, "hyperparameters": {}, "input_datasets": [] } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("bermaneh/pde-llm-eval-results-v2", split="train") print(f"Loaded {len(dataset)} rows") ``` ---
提供机构:
bermaneh
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作