five

Turkish-NLI/legal_nli_TR_V1

收藏
Hugging Face2024-11-02 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Turkish-NLI/legal_nli_TR_V1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 dataset_info: features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: string splits: - name: train num_bytes: 1858442640 num_examples: 474283 - name: validation num_bytes: 18996841 num_examples: 5000 - name: test num_bytes: 19683829 num_examples: 5000 download_size: 725637794 dataset_size: 1897123310 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* task_categories: - sentence-similarity language: - tr tags: - legal size_categories: - 100K<n<1M --- # Turkish Law NLI Dataset This dataset is derived from case files of Turkish Commercial Courts and was prepared as part of a student project to contribute to the Turkish NLP literature. ## Source Data The dataset was created by collecting approximately 33,000 case rulings from [open sources](https://emsal.uyap.gov.tr/) using web scraping methods. The dataset includes only the "summary" sections of the case rulings, where the reason for each lawsuit is typically described. ## Data Structure and Labeling - The dataset was adapted for sentence similarity tasks, inspired by the [SNLI dataset](https://huggingface.co/datasets/stanfordnlp/snli). The goal of this project is to develop a semantic search model for identifying relevant precedent cases in legal settings. - This is the first version of the dataset, and future versions will incorporate additional metadata and employ more refined labeling techniques. ![First image from tree](images/TTK_1.jpg) ![Second image from tree](images/TTK_2.jpg) <div style="text-align: center; opacity: 0.7;"> <p style="font-style: italic;">Some sections of the Tree Structure</p> </div> ## Labeling Methodology To establish relationships between case files, legal articles within each case were utilized. Only commercial cases governed by the [Turkish Commercial Code (TTK)](https://www.mevzuat.gov.tr/mevzuat?MevzuatNo=6102&MevzuatTur=1&MevzuatTertip=5) are included. Articles from the TTK were aligned in a hierarchical structure, considering main and subheadings, and were transformed into a tree structure. The relationship between cases was determined by calculating distances between the articles they contain within this tree structure. ### Label Types - **Entailment:** For each case, the 7 closest cases (with lower distances indicating closer relationships) were labeled as related. - **Contradiction:** For each case, the 7 most distant cases were labeled as unrelated. - **Neutral:** Each case was labeled as neutral with respect to the legal articles it contains. ## Contributors - Mesut Demirel - Recep Karabulut
提供机构:
Turkish-NLI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作