five

dreeseaw/mdlens-combined-markdown-v1

收藏
Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/dreeseaw/mdlens-combined-markdown-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - text-retrieval language: - en pretty_name: mdlens Combined Markdown Eval v1 size_categories: - 1K<n<10K tags: - markdown - retrieval - agent-eval - documentation --- # mdlens Combined Markdown Eval v1 This dataset supports the `mdlens` v1 Markdown retrieval evaluation. It is entirely a Markdown QA/search eval. It is not a broad coding-agent benchmark, even though Markdown QA is a common part of coding-agent work. ## Contents - `docs/`: 1,783 Markdown files, about 17.0 MB of source text. - `questions.jsonl`: 30 locked hard questions. - `manifest.json`: source corpus counts and byte totals. - `summary.md` / `summary.json`: aggregate eval summaries. - `reports/`: per-harness/model Markdown reports from the final run. The corpus combines three fixture families: - carefully curated messy generated/scene Markdown with malformed formatting, stale notes, copied distractors, tables, and multiple needles - a SciCat-style scientific README proxy with Hugging Face and GitHub scientific Markdown fallback material - codebase documentation fixtures from real repository docs, runbooks, design notes, and experiment reports Five of the 30 questions are workflow-like cross-corpus analysis tasks, but no question requires code edits. ## Reproducing Clone the public tool repository: ```bash git clone https://github.com/Dreeseaw/mdlens cd mdlens cargo build --release ``` Then run the eval harness from the original development repository, or adapt the locked `questions.jsonl` to any agent harness. The comparison is: - baseline: shell tools such as `rg`, `find`, `sed`, and `cat` - mdlens: start with `mdlens scout docs/ "<question>" --max-tokens 1400`, then use `mdlens read` only when one section detail is missing ## Caveats The final published summary excluded partial harness/model pairs from headline averages. Native Claude Sonnet and Haiku had provider `exit_1` rows; those runs remain visible in the reports.
提供机构:
dreeseaw
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作