lexlms/legal_lama
收藏数据集概述:LegalLAMA
数据集描述
数据集总结
LegalLAMA 是一个包含8个子任务的多样化探针基准套件,旨在评估预训练中大型语言模型(PLMs)所获取的法律知识。
数据集规格
| 语料库 | 语料库别名 | 示例数量 | 平均令牌数 | 标签数量 |
|---|---|---|---|---|
| Criminal Code Sections (Canada) | canadian_sections |
321 | 72 | 144 |
| Legal Terminology (EU) | cjeu_term |
2,127 | 164 | 23 |
| Contractual Section Titles (US) | contract_sections |
1,527 | 85 | 20 |
| Contract Types (US) | contract_types |
1,089 | 150 | 15 |
| ECHR Articles (CoE) | ecthr_articles |
5,072 | 69 | 13 |
| Legal Terminology (CoE) | ecthr_terms |
6,803 | 97 | 250 |
| Crime Charges (US) | us_crimes |
4,518 | 118 | 59 |
| Legal Terminology (US) | us_terms |
5,829 | 308 | 7 |
使用方法
通过指定语料库别名,可以加载特定的子语料库。
python from datasets import load_dataset dataset = load_dataset(lexlms/legal_lama, name=ecthr_terms)
引用信息
@inproceedings{chalkidis-etal-2023-lexfiles, title = "{L}e{XF}iles and {L}egal{LAMA}: Facilitating {E}nglish Multinational Legal Language Model Development", author = "Chalkidis, Ilias and Garneau, Nicolas and Goanta, Catalina and Katz, Daniel and S{o}gaard, Anders", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.865", pages = "15513--15535", }



