CSI-lab/RCW_2025_Positive_Query_Pairs
收藏Hugging Face2026-04-28 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CSI-lab/RCW_2025_Positive_Query_Pairs
下载链接
链接失效反馈官方服务:
资源简介:
华盛顿法律基准(WLB)是一个大规模合成数据集,专门用于推进法律信息检索(IR)和语义搜索。它弥合了自然语言(公民、地方政府和普通英语用户描述法律场景的方式)与正式法定法律术语(法律实际书写的方式)之间的关键“语义鸿沟”。数据集包含数十万精心配对的示例,将普通英语查询、假设场景和地方立法草案直接映射到其管辖的华盛顿州法规(修订的华盛顿法典 - RCW)。该数据集非常适合使用对比学习技术(如多负面排名损失)来微调密集检索器(如bge、e5或MiniLM)。
The **Washington Law Benchmark (WLB)** is a large-scale, synthetic dataset designed specifically to advance Legal Information Retrieval (IR) and Semantic Search. It bridges the critical "semantic gap" between natural language (how citizens, local governments, and plain-English users describe legal scenarios) and formal statutory legalese (how laws are actually written). The dataset contains hundreds of thousands of meticulously paired examples mapping plain-English queries, hypothetical scenarios, and local legislative drafts directly to their governing **Washington State Statutes (Revised Code of Washington - RCW)**. This dataset is ideal for fine-tuning dense retrievers (like `bge`, `e5`, or `MiniLM`) using contrastive learning techniques such as Multiple Negatives Ranking (MNR) loss.
提供机构:
CSI-lab



