five

jonsun-zhao/canadian-tax-law-qa

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jonsun-zhao/canadian-tax-law-qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - question-answering - text-generation tags: - legal - canadian-tax-law - income-tax-act - excise-tax-act - law - finance pretty_name: Canadian Tax Law Q&A Dataset size_categories: - 10K<n<100K --- # Canadian Tax Law Q&A Dataset ## Dataset Description This dataset contains 11,311 question-answer pairs based on Canadian tax law, specifically the Income Tax Act (ITA) and Excise Tax Act (ETA). The dataset is designed for fine-tuning legal language models, particularly SaulLM-7B. ### Dataset Summary - **Language:** English - **License:** CC-BY-4.0 - **Task:** Question Answering, Text Generation - **Domain:** Canadian Tax Law - **Format:** Conversational format with `<s>[INST]...[/INST]...</s>` tags ### Sources The Q&A pairs were generated from: - Income Tax Act (ITA) - Official consolidated text from laws-lois.justice.gc.ca - Excise Tax Act (ETA) - Official consolidated text from laws-lois.justice.gc.ca - CRA Income Tax Folios and Guides ### Data Format Each example follows the instruction-response format: ```json { "text": "<s>[INST] What is Section 118(1) of the Income Tax Act? [/INST] Section 118(1) of the Income Tax Act provides for the basic personal amount tax credit... </s>" } ``` ### Usage This dataset is intended for: - Fine-tuning legal language models on Canadian tax law - Training question-answering systems for tax-related queries - Research on domain-specific language model adaptation ### Example Usage ```python from datasets import load_dataset dataset = load_dataset("Christine-HiAiPerf/canadian-tax-law-qa") # Access splits train_data = dataset["train"] val_data = dataset["validation"] test_data = dataset["test"] ``` ### Data Collection The data was collected and processed through the following pipeline: 1. Web scraping of official Canadian government legal texts 2. Text chunking and preprocessing 3. Q&A pair generation using Llama-3.1-8B-Instruct 4. Quality filtering (citation checks, length validation, deduplication) 5. Formatting for SaulLM-7B instruction-following format ### Citation If you use this dataset, please cite: ```bibtex @dataset{canadian_tax_law_qa, title={Canadian Tax Law Q&A Dataset}, year={2026}, url={https://huggingface.co/datasets/Christine-HiAiPerf/canadian-tax-law-qa} } ``` ### Limitations - The dataset focuses specifically on Canadian tax law - Generated Q&A pairs may require human review for legal accuracy - The dataset is based on the legislation as of February 2026 ### Disclaimer This dataset is for research and educational purposes only. It should not be used as a substitute for professional legal or tax advice.
提供机构:
jonsun-zhao
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作