five

ZirTech/anti-doping-bench

收藏
Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ZirTech/anti-doping-bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en pretty_name: Anti-Doping Benchmark task_categories: - multiple-choice - question-answering size_categories: - 1K<n<10K tags: - wada - sports-law - llm-benchmark - legal-reasoning --- **A high-difficulty benchmark for LLMs based on the World Anti-Doping Code (WADA Code 2027).** Models must match article clauses to correct article numbers – a task requiring precise legal knowledge. --- ## Leaderboard (Top-12) | Model | Accuracy | Relative to Random | |-------|----------|---------------------| | **GPT 5.5** | **79.80%** (798/1000) | +54.8 pp | | **Claude Sonnet 4.6** | **46.60%** (466/1000) | +21.6 pp | | **Llama-4-Maverick-17B** | **34.00%** (340/1000) | +9.0 pp | | **Google Gemma 4 31B** | **32.80%** (328/1000) | +7.8 pp | | **Google Gemma 4 26B (a4b)** | **32.70%** (327/1000) | +7.7 pp | | GLM 4 32B | 28.90% (289/1000) | +3.9 pp | | LFM 2 24B (a2b) | 27.00% (270/1000) | +2.0 pp | | Command R7B | 26.60% (266/1000) | +1.6 pp | | Step 3.5 Flash | 12.80% (128/1000) | -12.2 pp | | Nemotron 3 Super 120B (a12b) | 11.90% (119/1000) | -13.1 pp | | Qwen 3.5 9B | 11.20% (112/1000) | -13.8 pp | | GPT OSS 120B | 3.20% (32/1000) | -21.8 pp | |*Random baseline* | *25.0% (250/1000)* | - | --- ## Languages The text in the dataset is in English. The associated BCP-47 code is en. --- ## Dataset Structure For the *test* configuration, each instance contains a line for an anti-doping question and lines with four matching answers to choose from. ```python { "id": "WADA-2027-A-001", "level": "A", "year": 2027, "question_text": "Which WADA rule most precisely matches the following clause: \"Subject to applicable law, to not knowingly employ or permit participation by a Person in a capacity where such employment or participation would conflict with the prohibitions described in Article 10.14.1.\"?", "question_type": "MCQ", "options": [ "WADC Art. 20.8.11", "WADC Art. 20.2.8", "WADC Art. 20.8.13", "WADC Art. 20.4.6" ], "correct_answers": [ 2 ], "explanation_short": "WADC Art. 20.2.8: Subject to applicable law, to not knowingly employ or permit participation by a Person in a capacity where such employment or participation would conflict with the prohibitions described in Article 10.14.1.", "source_rule": "WADC Art. 20.2.8", "requires_tue": false, "tags": [ "rule violations" ], "is_adversarial": false, "language": "en" } ``` For the *finetune* configuration, each instance contains a line for an anti-doping question and lines with four matching answers to choose from. ```python { "id": "WADA-2027-A-001", "level": "A", "year": 2027, "question_text": "Which of the following details is specifically stated in WADC Art. 20.6.6 regarding \"Person who was not bound by the rules\"?", "question_type": "MCQ", "options": [ "the International Standard for Code Compliance by Signatories, and (b) by any other sporting body over which it has authority, in accordance with Article 12", "Person who was not bound by the rules adopted pursuant to the Code, who has directly and intentionally engaged in conduct within the previous six (6) years which would have constituted a violation of anti-doping rules if Code-compliant...", "first violation was determined based on pre -2027 Code rules, the period of Ineligibility which would have been assessed for that first violation had 2027 Code rules been applicable, shall be applied.165", "in line with requirements of the International Standard for Education" ], "correct_answers": [ 2 ], "explanation_short": "WADC Art. 20.6.6: Person who was not bound by the rules adopted pursuant to the Code, who has directly and intentionally engaged in conduct within the previous six (6) years which would have constituted a violation of anti-doping rules if Code-compliant...", "source_rule": "WADC Art. 20.6.6", "requires_tue": false, "tags": [ "rule violations" ], "is_adversarial": false, "language": "en" } ``` --- ## Data Splits | name |train| |--------|----:| |test | 1000| |finetune| 1000| --- ## Why this benchmark? - **Real-world legal reasoning** – not trivia, but exact matching of code clauses. - **High difficulty** – even top open-source models barely exceed random chance. - **Curated by domain expert** – all questions derived directly from WADA Code 2027. --- ```bibtex @misc{zirtech2026antidoping, author = {Zirt Techniques}, title = {Anti-Doping Benchmark: Evaluating LLMs on the World Anti-Doping Code}, year = {2026}, publisher = {Hugging Face}, note = {1000 questions, WADA Code 2027} } ``` --- ![image](https://cdn-uploads.huggingface.co/production/uploads/69b3129f02a20db8381db62e/TSPe_QNVVlO7_gzUEpP8E.png)
提供机构:
ZirTech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作