bermaneh/pde-mc-logprob-results-v2
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/pde-mc-logprob-results-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- pde-mc-logprob
- mc
- v2
- phys-valid-rerun
---
# pde-mc-logprob-results-v2
Full MC logprob results — 9 models × 96 rows × 9 question types = 7776 rows. phys_valid rows rerun with updated prompt: 'Does this code run and produce a correct physical solution for the PDE?' (replaces original 'Is this simulation physically valid?')
## Dataset Info
- **Rows**: 7776
- **Columns**: 19
## Columns
| Column | Type | Description |
|--------|------|-------------|
| model | Value('large_string') | *No description provided* |
| title | Value('large_string') | *No description provided* |
| pde_class | Value('large_string') | *No description provided* |
| mod_type | Value('large_string') | *No description provided* |
| question_type | Value('large_string') | *No description provided* |
| candidate | Value('large_string') | *No description provided* |
| letters | Value('large_string') | *No description provided* |
| correct_letter | Value('large_string') | *No description provided* |
| logprob_A | Value('float64') | *No description provided* |
| logprob_B | Value('float64') | *No description provided* |
| logprob_C | Value('float64') | *No description provided* |
| logprob_D | Value('float64') | *No description provided* |
| predicted_letter | Value('large_string') | *No description provided* |
| correct | Value('bool') | *No description provided* |
| logprob_correct | Value('float64') | *No description provided* |
| margin | Value('float64') | *No description provided* |
| entropy | Value('float64') | *No description provided* |
| finish_reason | Value('large_string') | *No description provided* |
| scoring_method | Value('large_string') | *No description provided* |
## Generation Parameters
```json
{
"script_name": "merge_valid_rerun.py",
"model": "multi (9 models)",
"description": "Full MC logprob results \u2014 9 models \u00d7 96 rows \u00d7 9 question types = 7776 rows. phys_valid rows rerun with updated prompt: 'Does this code run and produce a correct physical solution for the PDE?' (replaces original 'Is this simulation physically valid?')",
"experiment_name": "pde-mc-logprob",
"job_id": "torch:7244243",
"cluster": "torch",
"artifact_status": "final",
"canary": false,
"hyperparameters": {},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("bermaneh/pde-mc-logprob-results-v2", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
提供机构:
bermaneh



