five

th1nhng0/vietnamese-legal-documents

收藏
Hugging Face2026-04-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/th1nhng0/vietnamese-legal-documents
下载链接
链接失效反馈
官方服务:
资源简介:
越南法律文档数据集是一个全面的越南法律文档集合,包括法律、法令、通知、决定和其他规范性文件,来源于越南政府法律文件门户网站vbpl.vn(由司法部运营)。该数据集包含每个文档的结构化元数据、原始HTML全文内容以及丰富的跨文档法律关系图(如修订、引用、废除等)。数据集有四个配置:metadata(元数据,包含153,420个文档的16个字段,如标题、发布日期、法律类型、生效状态等)、content(内容,包含178,665个文档的原始HTML全文)、relationships(关系,包含897,890个文档间的有向边,表示法律文档之间的关联)和legacy(旧版数据,包含约518k个文档,字段名为英文)。数据集适用于文本分类、文本生成、问答和摘要等NLP任务,语言为越南语,许可证为CC BY 4.0。

Vietnamese Legal Documents is a comprehensive collection of Vietnamese legal documents — laws, decrees, circulars, decisions, and other normative acts — sourced from vbpl.vn, the official Government Legal Document Portal operated by the Ministry of Justice. The dataset includes structured metadata for every document, raw HTML full-text content, and a rich graph of cross-document legal relationships (amendments, citations, repeals, etc.). It has four configs: metadata (153,420 documents with 16 fields, such as title, issuance date, legal type, effect status), content (178,665 documents with raw HTML full-text), relationships (897,890 directed edges between documents), and legacy (an older crawl with ~518k documents, with English field names). The dataset is suitable for NLP tasks like text classification, text generation, question-answering, and summarization, in Vietnamese language, under CC BY 4.0 license.
提供机构:
th1nhng0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作