newmindai/caselaw-retrieval

Name: newmindai/caselaw-retrieval
Creator: newmindai
Published: 2026-01-23 14:48:03
License: 暂无描述

Hugging Face2026-01-23 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/newmindai/caselaw-retrieval

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含土耳其最高法院（Yargıtay）民事审判庭类别的判例法，旨在生成分层的案例摘要关键词（Catchwords）并用于基于文本的信息检索任务。数据集针对MTEB（大规模文本嵌入基准）分词器基准测试进行了优化，每个查询-语料库对都经过过滤，以确保不超过7000个令牌的最大限制。数据集由三个主要子集组成：查询（包含最高法院判决的案例摘要关键词）、语料库（包含最高法院判决的全文）和qrels（查询与语料库之间的关系）。数据集通过多层LLM架构生成，包括生成层、批评层和融合层，以确保数据质量。此外，数据集还提供了分词器基准测试和数据过滤的详细信息，以及使用案例、数据来源、许可证和引用信息。

This dataset consists of case law from the General Assembly of Civil Chambers category of the Turkish Court of Cassation (Yargıtay). The dataset is designed to generate hierarchical Case Summary Keywords (Catchwords) and for text-based information retrieval tasks. It is optimized for MTEB (Massive Text Embedding Benchmark) tokenizer benchmark tests, with each query-corpus pair filtered by a maximum token limit of 7000. The dataset comprises three main subsets: queries (containing case summary keywords from Court of Cassation decisions), corpus (containing full texts of Court of Cassation decisions), and qrels (relationships between queries and corpus). The dataset was generated using a multi-layer LLM architecture, including a generator layer, critic layer, and fuser layer, to ensure data quality. Additionally, the dataset provides detailed information on tokenizer benchmarking and data filtering, use cases, data source, license, and citation.

提供机构：

newmindai

5,000+

优质数据集

54 个

任务类型

进入经典数据集