five

LEXam

收藏
魔搭社区2026-01-07 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/LEXam
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center" style="display: flex; align-items: center; justify-content: center; gap: 16px;"> <img src="pictures/logo.png" alt="LEXam Logo" width="120" style="border: none;"> <div style="text-align: left;"> <h1 style="margin: 0; font-size: 2em;">LEXam: Benchmarking Legal Reasoning on 340 Law Exams</h1> <p style="margin: 6px 0 0; font-size: 1.2em;">A diverse, rigorous evaluation suite for legal AI from Swiss, EU, and international law examinations.</p> </div> </div> [**Paper**](https://arxiv.org/abs/2505.12864) | [**Website & Leaderboard**](https://lexam-benchmark.github.io/) | [**GitHub Repository**](https://github.com/LEXam-Benchmark/LEXam) ## 🔥 News - [2025/12] We reorganized all multiple-choice questions into four separate files, `mcq_4_choices` (n = 1,655), `mcq_8_choices` (n = 1,463), `mcq_16_choices` (n = 1,028), and `mcq_32_choices` (n = 550), all with standardized features. - [2025/11] We identified and corrected several annotation errors in the statements of the original multiple-choice questions. - [2025/09] We updated our evaluation results on open questions using an ensemble LLM-as-A-Judge. - [2025/05] Release of the first version of [paper](https://arxiv.org/abs/2505.12864), where we evaluate representative SoTA LLMs with evaluations stricly verified by legal experts. ## 🧩 Subsets The dataset entails the following subsets: 1. `open_question`: All long-form, open-ended questions of ***LEXam***. The data can be downloaded using: ```python from datasets import load_dataset data = load_dataset("LEXam-Benchmark/LEXam", "open_question") ``` - The dataset includes the following features: - `question`: The multiple-choice question. - `answer`: Reference answer provided by legal domain experts. - `course`: Title of the law course from which the question was derived. - `language`: Language of the question (`en` or `de`). - `area`: Legal area covered by the question (`criminal`, `public`, `private`, or `interdisciplinary`). - `jurisdiction`: Legal jurisdiction of the question (`Swiss`, `international`, or `generic`). - `year`: Year when the exam was administered (2016 to 2022). - `id`: Unique identifier for the question. 2. `mcq_{4, 8, 16, 32}_choices`: The standardMCQs of ***LEXam*** with {4, 8, 16, 32} choices. The data can be downloaded using: ```python from datasets import load_dataset data_4 = load_dataset("LEXam-Benchmark/LEXam", "mcq_4_choices") data_8 = load_dataset("LEXam-Benchmark/LEXam", "mcq_8_choices") data_16 = load_dataset("LEXam-Benchmark/LEXam", "mcq_16_choices") data_32 = load_dataset("LEXam-Benchmark/LEXam", "mcq_32_choices") ``` - The dataset includes the following features: - `question`: The multiple-choice question. - `choices`: List of {4, 8, 16, 32} answer choices. - `gold`: Position of the correct answer within the choices list. - `course`: Title of the law course from which the question was derived. - `language`: Language of the question (`en` or `de`). - `area`: Legal area covered by the question (`criminal`, `public`, `private`, or `interdisciplinary`). - `jurisdiction`: Legal jurisdiction of the question (`Swiss`, `international`, or `generic`). - `year`: Year when the exam was administered (2016 to 2022). - `n_statements`: Number of statements contained in the question (2 to 9). - `none_as_an_option`: Binary indicator specifying whether `None of the statements` (or `Keine der Aussagen`) is included among the answer choices. - `id`: Unique identifier for the question. - `negative_question`: Binary indicator specifying whether the question is phrased negatively (e.g. `Which of the following statements are incorrect?`). ## Citation If you find the dataset helpful, please consider citing ***LEXam***: ```shell @article{fan2025lexam, title = {LEXam: Benchmarking Legal Reasoning on 340 Law Exams}, author = {Fan, Yu and Ni, Jingwei and Merane, Jakob and Tian, Yang and Hermstr{\"u}wer, Yoan and Huang, Yinya and Akhtar, Mubashara and Salimbeni, Etienne and Geering, Florian and Dreyer, Oliver and Brunner, Daniel and Leippold, Markus and Sachan, Mrinmaya and Stremitzer, Alexander and Engel, Christoph and Ash, Elliott and Niklaus, Joel}, journal = {arXiv preprint arXiv:2505.12864}, year = {2025} } ```

<div align="center" style="display: flex; align-items: center; justify-content: center; gap: 16px;"> <img src="pictures/logo.png" alt="LEXam 标识" width="120" style="border: none;"> <div style="text-align: left;"> <h1 style="margin: 0; font-size: 2em;">LEXam:基于340场法律考试的法律推理能力评测基准</h1> <p style="margin: 6px 0 0; font-size: 1.2em;">一款面向瑞士、欧盟及国际法律考试的多元化、严谨规范的法律人工智能评测套件。</p> </div> </div> [**论文**](https://arxiv.org/abs/2505.12864) | [**官网与排行榜**](https://lexam-benchmark.github.io/) | [**GitHub 仓库**](https://github.com/LEXam-Benchmark/LEXam) ## 🔥 最新动态 - [2025/12] 我们将所有选择题重新整理为四个独立文件:`mcq_4_choices`(样本量1,655)、`mcq_8_choices`(样本量1,463)、`mcq_16_choices`(样本量1,028)与`mcq_32_choices`(样本量550),所有文件均采用标准化特征格式。 - [2025/11] 我们排查并修正了原始选择题题干中的多处标注错误。 - [2025/09] 我们采用集成式大语言模型 (LLM) 作为裁判的方案,更新了开放式问题的评测结果。 - [2025/05] 首发论文版本[论文](https://arxiv.org/abs/2505.12864),本研究对代表性前沿大语言模型进行评测,所有评测结果均经法律专家严格核验。 ## 🧩 数据集子集 本数据集包含以下子集: 1. `open_question`:***LEXam*** 中的所有长格式开放式问题。该数据集可通过以下代码加载: python from datasets import load_dataset data = load_dataset("LEXam-Benchmark/LEXam", "open_question") - 该子集包含以下字段: - `question`:题目内容 - `answer`:法律领域专家提供的参考答案 - `course`:该题目所属的法律课程名称 - `language`:题目语言(`en`代表英语,`de`代表德语) - `area`:题目覆盖的法律领域(可选值为`criminal`(刑法)、`public`(公法)、`private`(私法)或`interdisciplinary`(跨学科)) - `jurisdiction`:题目所属的法律管辖范围(可选值为`Swiss`(瑞士)、`international`(国际)或`generic`(通用)) - `year`:该考试举办的年份(范围为2016至2022年) - `id`:题目的唯一标识符 2. `mcq_{4, 8, 16, 32}_choices`:***LEXam*** 中包含4、8、16、32个选项的标准多项选择题集。该数据集可通过以下代码加载: python from datasets import load_dataset data_4 = load_dataset("LEXam-Benchmark/LEXam", "mcq_4_choices") data_8 = load_dataset("LEXam-Benchmark/LEXam", "mcq_8_choices") data_16 = load_dataset("LEXam-Benchmark/LEXam", "mcq_16_choices") data_32 = load_dataset("LEXam-Benchmark/LEXam", "mcq_32_choices") - 该子集包含以下字段: - `question`:多项选择题题干 - `choices`:包含4、8、16、32个选项的答案列表 - `gold`:正确答案在选项列表中的索引位置 - `course`:该题目所属的法律课程名称 - `language`:题目语言(`en`代表英语,`de`代表德语) - `area`:题目覆盖的法律领域(可选值为`criminal`(刑法)、`public`(公法)、`private`(私法)或`interdisciplinary`(跨学科)) - `jurisdiction`:题目所属的法律管辖范围(可选值为`Swiss`(瑞士)、`international`(国际)或`generic`(通用)) - `year`:该考试举办的年份(范围为2016至2022年) - `n_statements`:题干包含的陈述数量(范围为2至9) - `none_as_an_option`:二元标记字段,用于标识答案选项中是否包含“无正确陈述”(德语对应`Keine der Aussagen`) - `id`:题目的唯一标识符 - `negative_question`:二元标记字段,用于标识题目是否采用否定式表述(例如“以下哪项陈述是错误的?”) ## 引用格式 若您认为本数据集对研究有所帮助,请引用***LEXam***: shell @article{fan2025lexam, title = {LEXam: Benchmarking Legal Reasoning on 340 Law Exams}, author = {Fan, Yu and Ni, Jingwei and Merane, Jakob and Tian, Yang and Hermstr{"u}wer, Yoan and Huang, Yinya and Akhtar, Mubashara and Salimbeni, Etienne and Geering, Florian and Dreyer, Oliver and Brunner, Daniel and Leippold, Markus and Sachan, Mrinmaya and Stremitzer, Alexander and Engel, Christoph and Ash, Elliott and Niklaus, Joel}, journal = {arXiv preprint arXiv:2505.12864}, year = {2025} }
提供机构:
maas
创建时间:
2025-05-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作