dreeseaw/mdlens-combined-markdown-v1

Name: dreeseaw/mdlens-combined-markdown-v1
Creator: dreeseaw
Published: 2026-04-26 19:44:13
License: 暂无描述

Hugging Face2026-04-26 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/dreeseaw/mdlens-combined-markdown-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - question-answering - text-retrieval language: - en pretty_name: mdlens Combined Markdown Eval v1 size_categories: - 1K<n<10K tags: - markdown - retrieval - agent-eval - documentation --- # mdlens Combined Markdown Eval v1 This dataset supports the `mdlens` v1 Markdown retrieval evaluation. It is entirely a Markdown QA/search eval. It is not a broad coding-agent benchmark, even though Markdown QA is a common part of coding-agent work. ## Contents - `docs/`: 1,783 Markdown files, about 17.0 MB of source text. - `questions.jsonl`: 30 locked hard questions. - `manifest.json`: source corpus counts and byte totals. - `summary.md` / `summary.json`: aggregate eval summaries. - `reports/`: per-harness/model Markdown reports from the final run. The corpus combines three fixture families: - carefully curated messy generated/scene Markdown with malformed formatting, stale notes, copied distractors, tables, and multiple needles - a SciCat-style scientific README proxy with Hugging Face and GitHub scientific Markdown fallback material - codebase documentation fixtures from real repository docs, runbooks, design notes, and experiment reports Five of the 30 questions are workflow-like cross-corpus analysis tasks, but no question requires code edits. ## Reproducing Clone the public tool repository: ```bash git clone https://github.com/Dreeseaw/mdlens cd mdlens cargo build --release ``` Then run the eval harness from the original development repository, or adapt the locked `questions.jsonl` to any agent harness. The comparison is: - baseline: shell tools such as `rg`, `find`, `sed`, and `cat` - mdlens: start with `mdlens scout docs/ "<question>" --max-tokens 1400`, then use `mdlens read` only when one section detail is missing ## Caveats The final published summary excluded partial harness/model pairs from headline averages. Native Claude Sonnet and Haiku had provider `exit_1` rows; those runs remain visible in the reports.

提供机构：

dreeseaw

5,000+

优质数据集

54 个

任务类型

进入经典数据集