five

Kaidon-Ng/singapore-statutes

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Kaidon-Ng/singapore-statutes
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: sso-terms-of-use license_link: https://sso.agc.gov.sg/Terms-of-Use language: - en task_categories: - text-retrieval - text-classification - question-answering tags: - legal - singapore - statutes - subsidiary-legislation - acts - regulations - knowledge-graph - legal-nlp - asean pretty_name: Singapore Statutes and Subsidiary Legislation (SSO) size_categories: - 10K<n<100K --- # Singapore Statutes and Subsidiary Legislation A structured, section-level dataset of all Singapore statutes (Acts) and subsidiary legislation (Regulations, Rules, Orders) from [Singapore Statutes Online (SSO)](https://sso.agc.gov.sg/). ## Disclaimer **This is an unofficial reproduction of Singapore legislation sourced from Singapore Statutes Online (sso.agc.gov.sg). It is not and should not be relied upon as the authoritative text of Singapore legislation.** The official text is the version published in the Government Gazette. Source: Attorney-General's Chambers of Singapore via Singapore Statutes Online. Use of this dataset is subject to SSO's [Terms of Use](https://sso.agc.gov.sg/Terms-of-Use). ## Dataset Summary | | Count | |---|---| | Acts (primary legislation) | 535 | | Subsidiary Legislation (regulations, rules, orders) | 1782 | | Total documents | 2317 | | Total sections (text chunks) | 55,221 | | Act sections | 28,943 | | Subsidiary Legislation sections | 26,278 | This dataset provides **parsed, section-level text** rather than raw PDFs. Each section includes its hierarchical position (Part, Division) and parent document metadata. Subsidiary legislation is linked to its parent Act. ## Data Structure ### `documents.parquet` — Document metadata | Column | Type | Description | |--------|------|-------------| | `name` | string | Act or SL name (e.g. "Employment Act 1968") | | `doc_type` | string | `"act"` or `"subsidiary_legislation"` | | `parent_act` | string | Parent Act name (SL only, empty for Acts) | | `page_count` | int | Number of pages in source PDF | | `num_sections` | int | Number of parsed sections | ### `sections.parquet` — Section-level text | Column | Type | Description | |--------|------|-------------| | `doc_name` | string | Parent document name | | `doc_type` | string | `"act"` or `"subsidiary_legislation"` | | `parent_act` | string | Parent Act name (SL only) | | `section_title` | string | Section heading (e.g. "14. Termination of contract without notice") | | `part` | string | Part heading (e.g. "PART 4 REST DAYS, HOURS OF WORK AND OTHER CONDITIONS") | | `division` | string | Division heading if applicable | | `pages` | string | JSON array of source PDF page numbers | | `text` | string | Full section text | ## Use Cases - **Legal NLP**: Train or evaluate models on Singapore statutory text - **Legal information retrieval**: Build search systems over structured legislation - **Knowledge graph construction**: Act-SL hierarchies, cross-references between statutes - **Statutory definition extraction**: Interpretation sections contain formal term definitions - **Comparative law**: Compare Singapore legislation structure with other common law jurisdictions - **RAG (Retrieval-Augmented Generation)**: Section-level chunks ready for embedding and retrieval ## How to Load ```python from datasets import load_dataset # Load both splits docs = load_dataset("Kaidon-Ng/singapore-statutes", data_files="data/documents.parquet", split="train") sections = load_dataset("Kaidon-Ng/singapore-statutes", data_files="data/sections.parquet", split="train") # Filter by type acts_only = sections.filter(lambda x: x["doc_type"] == "act") sl_only = sections.filter(lambda x: x["doc_type"] == "subsidiary_legislation") # Get all sections from Employment Act ea = sections.filter(lambda x: x["doc_name"] == "Employment Act 1968") ``` ## Source and License - **Source**: [Singapore Statutes Online (SSO)](https://sso.agc.gov.sg/), Attorney-General's Chambers of Singapore - **License**: Subject to [SSO Terms of Use](https://sso.agc.gov.sg/Terms-of-Use) - **Official text**: The Government Gazette version is the authoritative text of Singapore legislation - **Processing**: PDFs parsed into structured JSON, then flattened to section-level Parquet ## Limitations - This is an **unofficial** reproduction and may contain parsing artifacts from PDF extraction - Legislation text may not reflect the most recent amendments - Some sections may have formatting issues from PDF-to-text conversion - Does not include repealed legislation or historical versions - Does not include case law or judicial interpretations ## Citation If you use this dataset in your research, please cite: ```bibtex @misc{singapore_statutes_sso, title={Singapore Statutes and Subsidiary Legislation (SSO)}, author={LawBot Project}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/datasets/Kaidon-Ng/singapore-statutes} } ```
提供机构:
Kaidon-Ng
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作