Kaidon-Ng/singapore-statutes
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Kaidon-Ng/singapore-statutes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: sso-terms-of-use
license_link: https://sso.agc.gov.sg/Terms-of-Use
language:
- en
task_categories:
- text-retrieval
- text-classification
- question-answering
tags:
- legal
- singapore
- statutes
- subsidiary-legislation
- acts
- regulations
- knowledge-graph
- legal-nlp
- asean
pretty_name: Singapore Statutes and Subsidiary Legislation (SSO)
size_categories:
- 10K<n<100K
---
# Singapore Statutes and Subsidiary Legislation
A structured, section-level dataset of all Singapore statutes (Acts) and subsidiary legislation (Regulations, Rules, Orders) from [Singapore Statutes Online (SSO)](https://sso.agc.gov.sg/).
## Disclaimer
**This is an unofficial reproduction of Singapore legislation sourced from Singapore Statutes Online (sso.agc.gov.sg). It is not and should not be relied upon as the authoritative text of Singapore legislation.** The official text is the version published in the Government Gazette.
Source: Attorney-General's Chambers of Singapore via Singapore Statutes Online.
Use of this dataset is subject to SSO's [Terms of Use](https://sso.agc.gov.sg/Terms-of-Use).
## Dataset Summary
| | Count |
|---|---|
| Acts (primary legislation) | 535 |
| Subsidiary Legislation (regulations, rules, orders) | 1782 |
| Total documents | 2317 |
| Total sections (text chunks) | 55,221 |
| Act sections | 28,943 |
| Subsidiary Legislation sections | 26,278 |
This dataset provides **parsed, section-level text** rather than raw PDFs. Each section includes its hierarchical position (Part, Division) and parent document metadata. Subsidiary legislation is linked to its parent Act.
## Data Structure
### `documents.parquet` — Document metadata
| Column | Type | Description |
|--------|------|-------------|
| `name` | string | Act or SL name (e.g. "Employment Act 1968") |
| `doc_type` | string | `"act"` or `"subsidiary_legislation"` |
| `parent_act` | string | Parent Act name (SL only, empty for Acts) |
| `page_count` | int | Number of pages in source PDF |
| `num_sections` | int | Number of parsed sections |
### `sections.parquet` — Section-level text
| Column | Type | Description |
|--------|------|-------------|
| `doc_name` | string | Parent document name |
| `doc_type` | string | `"act"` or `"subsidiary_legislation"` |
| `parent_act` | string | Parent Act name (SL only) |
| `section_title` | string | Section heading (e.g. "14. Termination of contract without notice") |
| `part` | string | Part heading (e.g. "PART 4 REST DAYS, HOURS OF WORK AND OTHER CONDITIONS") |
| `division` | string | Division heading if applicable |
| `pages` | string | JSON array of source PDF page numbers |
| `text` | string | Full section text |
## Use Cases
- **Legal NLP**: Train or evaluate models on Singapore statutory text
- **Legal information retrieval**: Build search systems over structured legislation
- **Knowledge graph construction**: Act-SL hierarchies, cross-references between statutes
- **Statutory definition extraction**: Interpretation sections contain formal term definitions
- **Comparative law**: Compare Singapore legislation structure with other common law jurisdictions
- **RAG (Retrieval-Augmented Generation)**: Section-level chunks ready for embedding and retrieval
## How to Load
```python
from datasets import load_dataset
# Load both splits
docs = load_dataset("Kaidon-Ng/singapore-statutes", data_files="data/documents.parquet", split="train")
sections = load_dataset("Kaidon-Ng/singapore-statutes", data_files="data/sections.parquet", split="train")
# Filter by type
acts_only = sections.filter(lambda x: x["doc_type"] == "act")
sl_only = sections.filter(lambda x: x["doc_type"] == "subsidiary_legislation")
# Get all sections from Employment Act
ea = sections.filter(lambda x: x["doc_name"] == "Employment Act 1968")
```
## Source and License
- **Source**: [Singapore Statutes Online (SSO)](https://sso.agc.gov.sg/), Attorney-General's Chambers of Singapore
- **License**: Subject to [SSO Terms of Use](https://sso.agc.gov.sg/Terms-of-Use)
- **Official text**: The Government Gazette version is the authoritative text of Singapore legislation
- **Processing**: PDFs parsed into structured JSON, then flattened to section-level Parquet
## Limitations
- This is an **unofficial** reproduction and may contain parsing artifacts from PDF extraction
- Legislation text may not reflect the most recent amendments
- Some sections may have formatting issues from PDF-to-text conversion
- Does not include repealed legislation or historical versions
- Does not include case law or judicial interpretations
## Citation
If you use this dataset in your research, please cite:
```bibtex
@misc{singapore_statutes_sso,
title={Singapore Statutes and Subsidiary Legislation (SSO)},
author={LawBot Project},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/datasets/Kaidon-Ng/singapore-statutes}
}
```
提供机构:
Kaidon-Ng



