nlp-waseda/e_gov_chunked

Name: nlp-waseda/e_gov_chunked
Creator: nlp-waseda
Published: 2025-01-14 05:27:34
License: 暂无描述

Hugging Face2025-01-14 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/nlp-waseda/e_gov_chunked

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个从e-Gov获取的日本法律文本数据集，每个文本数据被分割成不超过4096个token。数据集包含两个字段：text字段包含法律文本，metadata字段包含包括时代、语言、法律类型、年份、颁布月份、颁布日、法律编号、类别ID和分块ID等10个子字段的附加信息。数据集被随机分为训练集、验证集和测试集，保持了原始的类别分布，比例为8:1:1。

This is a Japanese law text dataset obtained from e-Gov, with each text data chunked into no more than 4,096 tokens. The dataset includes two fields: the text field contains the legal texts, and the metadata field contains additional information with 10 subfields including era, language, law type, year, promulgation month/day, law number, category ID, and chunk ID. The dataset is split randomly into training, validation, and test sets while preserving the original category distribution, with a ratio of 8:1:1.

提供机构：

nlp-waseda

5,000+

优质数据集

54 个

任务类型

进入经典数据集