jonsun-zhao/canadian-tax-law-qa
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jonsun-zhao/canadian-tax-law-qa
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- question-answering
- text-generation
tags:
- legal
- canadian-tax-law
- income-tax-act
- excise-tax-act
- law
- finance
pretty_name: Canadian Tax Law Q&A Dataset
size_categories:
- 10K<n<100K
---
# Canadian Tax Law Q&A Dataset
## Dataset Description
This dataset contains 11,311 question-answer pairs based on Canadian tax law, specifically the Income Tax Act (ITA) and Excise Tax Act (ETA). The dataset is designed for fine-tuning legal language models, particularly SaulLM-7B.
### Dataset Summary
- **Language:** English
- **License:** CC-BY-4.0
- **Task:** Question Answering, Text Generation
- **Domain:** Canadian Tax Law
- **Format:** Conversational format with `<s>[INST]...[/INST]...</s>` tags
### Sources
The Q&A pairs were generated from:
- Income Tax Act (ITA) - Official consolidated text from laws-lois.justice.gc.ca
- Excise Tax Act (ETA) - Official consolidated text from laws-lois.justice.gc.ca
- CRA Income Tax Folios and Guides
### Data Format
Each example follows the instruction-response format:
```json
{
"text": "<s>[INST] What is Section 118(1) of the Income Tax Act? [/INST] Section 118(1) of the Income Tax Act provides for the basic personal amount tax credit... </s>"
}
```
### Usage
This dataset is intended for:
- Fine-tuning legal language models on Canadian tax law
- Training question-answering systems for tax-related queries
- Research on domain-specific language model adaptation
### Example Usage
```python
from datasets import load_dataset
dataset = load_dataset("Christine-HiAiPerf/canadian-tax-law-qa")
# Access splits
train_data = dataset["train"]
val_data = dataset["validation"]
test_data = dataset["test"]
```
### Data Collection
The data was collected and processed through the following pipeline:
1. Web scraping of official Canadian government legal texts
2. Text chunking and preprocessing
3. Q&A pair generation using Llama-3.1-8B-Instruct
4. Quality filtering (citation checks, length validation, deduplication)
5. Formatting for SaulLM-7B instruction-following format
### Citation
If you use this dataset, please cite:
```bibtex
@dataset{canadian_tax_law_qa,
title={Canadian Tax Law Q&A Dataset},
year={2026},
url={https://huggingface.co/datasets/Christine-HiAiPerf/canadian-tax-law-qa}
}
```
### Limitations
- The dataset focuses specifically on Canadian tax law
- Generated Q&A pairs may require human review for legal accuracy
- The dataset is based on the legislation as of February 2026
### Disclaimer
This dataset is for research and educational purposes only. It should not be used as a substitute for professional legal or tax advice.
提供机构:
jonsun-zhao



