Dataset and additional files/softwares required for the paper "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents"

Name: Dataset and additional files/softwares required for the paper "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents"
Creator: Goyal, Pawan; Ghosh, Saptarshi; Paul, Shounak
Published: 2021-12-28 00:00:00
License: 暂无描述

Zenodo2021-12-28 更新2026-04-07 收录

下载链接：

https://zenodo.org/record/5806910

下载链接

链接失效反馈

官方服务：

资源简介：

This dump contains all files and softwares required for running the codes for the paper "LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents". Specifically, these codes are available at https://github.com/Law-AI/LeSICiN. LeSICiN is a deep neural network for the task of Legal Statute Identification which also uses graphical properties of the document-statute citation network for training and predictions. We have three datasets --- train, dev and test. These are all .jsonl files with each instance dict per line; each instance dict contains the unique id, list of sentences and cited labels of the particular instance. Also, there is a fourth file --- secs.jsonl, which stores the text of all the statutes in similar format. schemas.json list out the metapath schemas for fact and section type nodes, while type_map.json maps the id of each node to its type (Act/Chapter/Topic/Section/Fact). label_tree.json and citation_network.json list out the edges for the two parts of the network in the format of a 3-tuple ('source id', 'relationship type', 'target id') "ils2v.bin" is the pretrained sent2vec vectorizer that can generate a 200-dim vector for each sentence

提供机构：

Goyal, Pawan; Ghosh, Saptarshi; Paul, Shounak

创建时间：

2021-12-28