dreeseaw/mdlens-combined-markdown-v1
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/dreeseaw/mdlens-combined-markdown-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
- text-retrieval
language:
- en
pretty_name: mdlens Combined Markdown Eval v1
size_categories:
- 1K<n<10K
tags:
- markdown
- retrieval
- agent-eval
- documentation
---
# mdlens Combined Markdown Eval v1
This dataset supports the `mdlens` v1 Markdown retrieval evaluation.
It is entirely a Markdown QA/search eval. It is not a broad coding-agent
benchmark, even though Markdown QA is a common part of coding-agent work.
## Contents
- `docs/`: 1,783 Markdown files, about 17.0 MB of source text.
- `questions.jsonl`: 30 locked hard questions.
- `manifest.json`: source corpus counts and byte totals.
- `summary.md` / `summary.json`: aggregate eval summaries.
- `reports/`: per-harness/model Markdown reports from the final run.
The corpus combines three fixture families:
- carefully curated messy generated/scene Markdown with malformed formatting,
stale notes, copied distractors, tables, and multiple needles
- a SciCat-style scientific README proxy with Hugging Face and GitHub scientific
Markdown fallback material
- codebase documentation fixtures from real repository docs, runbooks, design
notes, and experiment reports
Five of the 30 questions are workflow-like cross-corpus analysis tasks, but no
question requires code edits.
## Reproducing
Clone the public tool repository:
```bash
git clone https://github.com/Dreeseaw/mdlens
cd mdlens
cargo build --release
```
Then run the eval harness from the original development repository, or adapt the
locked `questions.jsonl` to any agent harness. The comparison is:
- baseline: shell tools such as `rg`, `find`, `sed`, and `cat`
- mdlens: start with `mdlens scout docs/ "<question>" --max-tokens 1400`,
then use `mdlens read` only when one section detail is missing
## Caveats
The final published summary excluded partial harness/model pairs from headline
averages. Native Claude Sonnet and Haiku had provider `exit_1` rows; those runs
remain visible in the reports.
提供机构:
dreeseaw



