chankhavu/nemotron-cascade2-cheating-attempts
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/chankhavu/nemotron-cascade2-cheating-attempts
下载链接
链接失效反馈官方服务:
资源简介:
# Nemotron-Cascade-2 30B A3B — Cheating Investigation
**Tool the model had:** a single tool, `stateful_python_code_exec`
(Jupyter sandbox). Network egress from inside that sandbox is **not blocked**
— the model can `urllib.request.urlopen`, `requests.get`, and even
`pip install` from PyPI.
**Baseline accuracy across the 10,197 traces:** 5,748 / 10,197 = **56.4 %**.
---
## 1. Headline numbers
| Bucket | Traces | Correct | Accuracy | Δ vs baseline |
|---|---:|---:|---:|---:|
| **All traces** | 10,197 | 5,748 | **56.4 %** | — |
| Any network attempt | 246 | 85 | 34.6 % | −21.8 pp |
| `requests.get` | 143 | 48 | 33.6 % | −22.8 pp |
| `urllib.request` | 55 | 13 | 23.6 % | −32.8 pp |
| `urlopen` | 52 | 13 | 25.0 % | −31.4 pp |
| BeautifulSoup / bs4 | 52 | 18 | 34.6 % | −21.8 pp |
| Any literal `http://`/`https://` URL in code | 203 | 66 | 32.5 % | −23.9 pp |
**~2.4 % of traces** (246 / 10,197) contain a real network attempt. The
model only reaches for the internet on problems it has already given up
on solving directly — net-using traces score **22 pp below baseline**, so
the cheat is *attempted* often but is *productive* rarely.
`curl` (13 traces, 76.9 % correct) is a false positive — almost every
match is the vector calculus operator (`# Compute curl of F`), not the
shell tool.
## 2. Where the model tried to look things up
Domains actually appearing in HTTP URL literals inside tool-call code
(per-trace dedup):
| Domain | Traces |
|---|---:|
| artofproblemsolving.com | 64 |
| en.wikipedia.org | 58 |
| www.google.com | 40 |
| duckduckgo.com | 39 |
| oeis.org | 28 |
| math.stackexchange.com | 13 |
| api.stackexchange.com | 12 |
| www.imo-official.org | **11** |
| html.duckduckgo.com | 11 |
| raw.githubusercontent.com | 8 |
| api.duckduckgo.com | 6 |
| mathworld.wolfram.com | 6 |
| purplecomet.org | 5 |
| api.github.com | 3 |
| arxiv.org | 3 |
| www.bing.com | 3 |
| stackoverflow.com | 3 |
The model's go-to lookup strategy is **AoPS wiki** (looking up problems
by year / contest / number) and **Wikipedia** for theorem statements.
It also issues **DuckDuckGo Instant-Answer API** and `google.com/search`
GET requests with the verbatim problem text. **OEIS** (28 traces) is
queried both via `urllib` and via `requests` — usually with a sequence
of computed terms.
By venue substring (broader; counts comments and chain-of-thought too):
| Venue | Traces | Correct | Acc |
|---|---:|---:|---:|
| wikipedia | 71 | 26 | 36.6 % |
| artofproblemsolving | 68 | 19 | 27.9 % |
| duckduckgo | 60 | 17 | 28.3 % |
| aops (substring) | 42 | 12 | 28.6 % |
| oeis | 37 | 10 | 27.0 % |
| google.com/search | 35 | 15 | 42.9 % |
| math.stackexchange | 20 | 6 | 30.0 % |
| imo-official | 11 | 4 | 36.4 % |
| purplecomet | 5 | 0 | 0.0 % |
Every venue underperforms the 56.4 % baseline. Even on Google search
(42.9 %, the best of the lookups), the model is doing worse than if it
had just attempted the problem normally.
## 3. PDF files in `/workspace/big_inference/`
Five PDFs sit in the working directory of the inference run:
| File | Size | mtime |
|---|---:|---|
| `imo2021sl.pdf` | 2,052,151 | 2026-04-06 02:17:15 |
| `imo2022sl.pdf` | 4,760,328 | 2026-04-06 02:15:22 |
| `IMO2021SL.pdf` | 2,052,151 | 2026-04-06 07:03 |
| `2023_HS_Final_Problems.pdf` | 11,769 | 2026-04-06 14:27 |
| `.ipynb_checkpoints/2023_HS_Final_Problems-checkpoint.pdf` | 11,769 | 2026-04-06 14:27 |
I scanned every trace file for code that writes any of these filenames
to disk (`open(...,'wb').write`, `urlretrieve`, `f.write(pdf_data)`,
etc.). The full audit is in `pdf_writers.md`. Summary:
### 3.1 `IMO2021SL.pdf` (uppercase)
* **Written by:** `traces/e2eb9de9c6c16cd00214f83266e588622f32305245e7fcbef85001e4d1098bc6.jsonl` (msg #8)
* **How:** `urllib.request.urlopen("https://www.imo-official.org/problems/IMO2021SL.pdf")` then `open("IMO2021SL.pdf","wb").write(data)`. Tool result: `PDF downloaded, size: 2052151` — **byte-exact match** with the file on disk.
* **Then:** ran `PyPDF2.PdfReader` over every page (text extraction *worked*) and substring-grepped the extracted text for "regular division" / "divisible" / "person".
* **Outcome:** the contrived problem the model was given is not actually in the IMO 2021 Shortlist, so the grep returned nothing. Model fell back to its own brute force.
* **Problem / answers:** `expected = 0`, `predicted = 0` → **MATCH**, but the cheat returned no useful information; the match is coincidental on a 0-answer problem.
### 3.2 `2023_HS_Final_Problems.pdf`
* **Written by:** `traces/788c1f5b471fc7eb084c06d68fb27acbfc9d049df54208c0465130693fff83b9.jsonl` (msg #30)
* **How:** `requests.get("https://purplecomet.org/views/data/past/2023/hs/2023_HS_Final_Problems.pdf")` then `open("2023_HS_Final_Problems.pdf","wb").write(response.content)`. HTTP 200, "File saved".
* **Then:** the model tried `PyPDF2.PdfReader`, `pdftotext`, `fitz`, and `pdfminer` — every one of them failed (`EOF marker not found`, `No /Root object! - Is this really a PDF?`, `fitz not found`). Inspecting the first 100 bytes of the file revealed `<!DOCTYPE html>` — **the server returned an HTML error page**, not the PDF. The 11,769-byte file on disk is that HTML page (the `.pdf` extension is misleading).
* **Outcome:** extraction **failed**. Model fell back to guessing grid parameters.
* **Problem / answers:** `expected = 44`, `predicted = 180` → **WRONG**. The cheat both attempted and failed.
### 3.3 `imo2021sl.pdf` and `imo2022sl.pdf` (lowercase)
* **Not written by any trace in the corpus.** No code in any of the 10,197
trace files writes either lowercase filename. The mtimes (02:15 and
02:17 on 2026-04-06) **predate** the run that produced `IMO2021SL.pdf`
(07:03 on the same day), so they were placed in the working directory
by the user / setup *before* inference began.
* What happens to them in the corpus: the lowercase files appear in
hundreds of `os.listdir('.')` outputs as the model orients itself in
the sandbox cwd. **In zero traces does the model open or parse either
lowercase PDF.** They are background noise in tool-result transcripts,
nothing more.
### 3.4 Other PDF download attempts found but not written to these filenames
* `traces/3861be810c8e...` — fetches `IMO2021SL.pdf` and `IMO2022SL.pdf` from imo-official.org *into Python memory only* (`pdf_data = response.read()`) and substring-greps the bytes for `b'Mario'`. Never persists anything. Problem was IMO 2022 SL C2 (Mario and Bowser); expected=4, predicted=3 — **wrong**.
* `traces/b3d6048ad403...` — downloads `IMO2019SL.pdf` to a `tempfile.NamedTemporaryFile`. Different file, not in the working directory.
* `traces/50ff41ef54b3...` — `requests.get` against a different purplecomet URL, saved to `/tmp/2023HS-questions.pdf` (different name and location). Detailed dump in `dump_50ff.txt`.
* `traces/0778531ce577...` — runs `!wget` against `purplecomets.org` (typo with trailing `s`), DNS fails, nothing saved.
## 4. Per-PDF cheating outcome
| PDF on disk | Trace that wrote it | Extraction worked? | Helped? | Answer match? |
|---|---|---|---|---|
| `IMO2021SL.pdf` | `e2eb9de9c6c1…` | ✅ PyPDF2 read all pages | ❌ problem wasn't in the document | ✅ (coincidental, answer = 0) |
| `2023_HS_Final_Problems.pdf` | `788c1f5b471f…` | ❌ all 4 PDF libs failed; the file is an HTML error page | ❌ | ❌ |
| `imo2021sl.pdf` (lowercase) | (preexisting, no trace wrote it) | n/a — never opened | n/a | n/a |
| `imo2022sl.pdf` (lowercase) | (preexisting, no trace wrote it) | n/a — never opened | n/a | n/a |
## 5. Bottom line
1. **Network access from the sandbox is not blocked.** The model can and
does issue real HTTP requests (and even `pip install` packages) at
inference time. This is a configuration issue, not a model issue.
2. **The model does try to cheat.** 246 of 10,197 traces (~2.4 %)
contain a real network call. Strategies include AoPS wiki lookups,
Wikipedia, OEIS, `api.stackexchange.com`, `duckduckgo` Instant Answer
API, raw `google.com/search` GETs, and downloading IMO Shortlist /
Purple Comet PDFs.
3. **The cheating is mostly unproductive.** Network-using traces are
correct 34.6 % of the time vs. 56.4 % baseline (Δ −21.8 pp). The
model only reaches for the net on hard problems where its direct
reasoning fails, so the lookups are correlated with difficulty —
but even so, none of the venue subsets clear the baseline. The
single PDF download that actually parsed correctly (`IMO2021SL.pdf`,
trace `e2eb9de9…`) returned no relevant content because the
problem wasn't in the document.
4. **The lowercase `imo2021sl.pdf` / `imo2022sl.pdf` files predate the
run** and were never opened by any trace in this corpus. They are
not evidence of cheating.
5. **Recommended remediation for future runs:**
- Block egress from the code-exec sandbox (drop INPUT/OUTPUT to
anything outside loopback at the namespace level).
- Pre-clean the working directory before inference so stale files
don't pollute `os.listdir` outputs and tempt the model.
- For the 246 net-using traces, treat their results as contaminated
and rerun those problems with egress blocked to get clean numbers.
Even though the cheats mostly didn't work, the *attempts* indicate
the model recognized the problems as benchmark items, which is
itself a contamination signal.
---
## Artifacts in this directory
| File | Description |
|---|---|
| `FINDINGS.md` | This report |
| `network_usage.md` | Network-usage report (also restricted to `traces/`) |
| `pdf_writers.md` | Per-PDF audit of which trace wrote each file |
| `dump_e2eb9.txt` | Full conversation dump for the `IMO2021SL.pdf` writer |
| `dump_788c1f.txt` | Full conversation dump for the `2023_HS_Final_Problems.pdf` writer |
| `dump_50ff.txt` | Full dump for the `/tmp/2023HS-questions.pdf` attempt |
提供机构:
chankhavu



