ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# Income Tax Act 2025 (India) - Machine-Readable Legal Text Dataset
## Overview
This dataset contains the complete text and structured information from India's **Income-Tax Act, 2025** (effective from April 1, 2026). It provides comprehensive coverage of all chapters and sections of the act, formatted as JSON for easy parsing and analysis by AI/ML systems.
## Dataset Description
The Income Tax Act, 2025 is the primary legislation governing income taxation in India. This dataset extracts and structures all sections, chapters, and provisions into a machine-readable format for research, legal analysis, tax compliance, and AI/ML applications including semantic search, knowledge extraction, and domain-specific model training.
## Dataset Size & Format
- **Format**: JSON (Line-delimited)
- **Content**: Complete Act with all chapters and sections
- **Language**: English
- **Encoding**: UTF-8
- **Total Entries**: Comprehensive coverage of all Income Tax Act, 2025 sections
## Data Structure
Each entry in the dataset contains the following fields:
| Field | Type | Description |
|-------|------|-------------|
| `act_name` | string | Name of the act (e.g., "Income-Tax Act, 2025") |
| `act_code` | string | Code identifier for the act (e.g., "ITA2025") |
| `effective_from` | string | Effective date in YYYY-MM-DD format |
| `chapter` | string | Chapter number of the act |
| `section` | string | Section number within the chapter |
| `title` | string | Title/heading of the section |
| `content` | string | Full text content of the section with legal language |
| `search_text` | string | Concatenated searchable text including all metadata |
| `chapter_name` | string | Descriptive name of the chapter |
| `chapter_subtype` | string | Subcategory or classification of the chapter |
| `doc_id` | string | Unique document identifier (ITA2025_[SECTION]_[CHUNK]) |
| `chunk_index` | integer | Index of chunk (0-based) for multi-chunk documents |
| `total_chunks` | integer | Total number of chunks for the document |
## Content Coverage
The dataset includes all major chapters of the Income Tax Act, 2025:
- **Chapter 1-4**: Income Classification and Heads of Income
- **Chapter 5**: Income of Other Persons Included
- **Chapter 6**: Aggregation of Income
- **Chapter 7**: Set Off and Carry Forward of Losses
- **Chapter 8+**: Deductions, Allowances, and Advanced Provisions
## Use Cases
- **Legal Research**: Search and analyze specific tax provisions and precedents
- **Tax Compliance**: Understand applicability of sections to different taxpayers
- **AI/ML Training**: Build domain-specific NLP models for tax law interpretation
- **Educational Tools**: Develop interactive tax education and training platforms
- **Document Retrieval**: Create semantic search systems for tax provisions
- **Knowledge Graphs**: Extract relationships and dependencies between sections
- **Tax Chatbots**: Power conversational AI for tax inquiries
- **Compliance Automation**: Automate tax provision applicability checking
## How to Use
### Load with Hugging Face Datasets
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text")
# Access specific sections
for example in dataset['train']:
if example['section'] == '100':
print(example['content'])
```
### Filter by Chapter
```python
# Filter sections from a specific chapter
filtered = [ex for ex in dataset['train'] if ex['chapter'] == '6']
print(f"Found {len(filtered)} sections in Chapter 6: {filtered[0]['chapter_name']}")
```
### Search by Keyword
```python
# Search for specific provisions
results = [ex for ex in dataset['train'] if 'unexplained' in ex['search_text'].lower()]
for result in results:
print(f"Section {result['section']}: {result['title']}")
```
### Convert to Pandas DataFrame
```python
import pandas as pd
dataset = load_dataset("ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text")
df = pd.DataFrame(dataset['train'])
# Analyze provisions by chapter
chapter_stats = df.groupby('chapter_name').size()
print(chapter_stats)
```
## Dataset Splits
- **Single split**: `train` (entire dataset as one comprehensive split for all analyses)
## Field Descriptions
### Identifiers & Metadata
- **doc_id**: Unique identifier in format `ITA2025_[SECTION_NUMBER]_[CHUNK_INDEX]`
- **act_code**: Standardized code for the act (`ITA2025`)
- **effective_from**: Date when the act came into force (`2026-04-01`)
### Content Fields
- **content**: Primary text of the section containing full legal language and provisions
- **search_text**: Enhanced text including metadata for semantic searching and NLP
- **title**: Section heading for quick reference and categorization
### Organization Fields
- **chapter**: Chapter number (e.g., "5", "6", "7")
- **section**: Section number within chapter (e.g., "100", "102", "104")
- **chapter_name**: Descriptive name of the chapter (e.g., "AGGREGATION OF INCOME")
- **chapter_subtype**: Subcategory of chapter for hierarchical organization
### Chunking Information
- **chunk_index**: Which part of a multi-part section (0-based indexing)
- **total_chunks**: Total number of chunks section is divided into
## Licensing & Attribution
- **Source**: Income-Tax Act, 2025 (Government of India - Ministry of Finance)
- **Public Domain**: Government legislative documents are in public domain in India
- **Attribution**: Please cite Government of India and this dataset
- **Terms**: Complies with GOI open data and public domain policies
## Data Quality
- ✅ Comprehensive coverage of all sections and chapters
- ✅ Properly formatted JSON with valid UTF-8 encoding
- ✅ Structured metadata for filtering and hierarchical retrieval
- ✅ Searchable text field optimized for semantic analysis
- ✅ Unique identifiers for precise referencing and linking
- ✅ Consistent formatting across all entries
## Known Limitations
- Text is current as of the effective date (April 1, 2026)
- Does not include subsequent amendments after April 2026
- Does not include case law, judicial interpretations, or precedents
- Supplementary rules and notifications not included
- No provisions from older acts (prior Income Tax Acts not included)
## Applications & Examples
### 1. Tax Advisory Systems
Build AI assistants that answer tax-related queries using semantic search and retrieval
### 2. Compliance Checking
Automate verification of tax provision applicability for specific business scenarios
### 3. Legal Document Analysis
Extract and match relevant sections in tax-related contracts and agreements
### 4. Educational Platforms
Power interactive learning tools and tutorials for taxation studies
### 5. Research & Analytics
Analyze tax provisions across sections and identify policy patterns
### 6. Document Retrieval Systems
Implement semantic search over Indian tax law for precise provision discovery
### 7. Tax Chatbots
Train conversational AI to answer taxpayer questions about specific provisions
## Related Resources
- [Income-Tax Department (India)](https://www.incometaxindia.gov.in/)
- [Ministry of Finance - India](https://www.finance.gov.in/)
- [Income Tax Act Official Portal](https://www.incometaxindia.gov.in/indian-income-tax-act/)
- [Finance Acts & Amendments](https://www.finance.gov.in/tamil/en/web/mof)
## Citation
If you use this dataset in your research, projects, publications, or work, please cite:
```bibtex
@dataset{income_tax_act_2025,
title={Income Tax Act 2025 (India) - Machine-Readable Legal Text Dataset},
author={Government of India, Ministry of Finance},
year={2026},
publisher={Hugging Face Datasets},
url={https://huggingface.co/datasets/ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text},
note={Effective from April 1, 2026}
}
```
## Contributing
Found errors or have suggestions?
- Open an issue for corrections or improvements
- Submit pull requests for enhancements
- Suggest additional metadata or features
- Report any formatting or encoding issues
## Disclaimer
**Important Legal Notice**: This dataset is provided for informational and research purposes only.
- **Not Legal Advice**: This should not be relied upon as the sole source for tax compliance or legal advice
- **Professional Consultation**: Always consult qualified tax professionals, chartered accountants, and official government sources
- **Accuracy**: While efforts are made to ensure accuracy, Government of India's official sources remain authoritative
- **Changes**: Tax laws are subject to amendments; always verify current applicability
- **Liability**: Users are responsible for verifying information and consulting appropriate professionals
- **Compliance**: Ensure compliance with applicable tax laws through official channels
---
**Dataset Version**: 1.0
**Release Date**: April 2026
**Last Updated**: April 20, 2026
**Maintained by**: Data Contributors & Hugging Face Community
**License**: Public Domain (Government of India)
**Status**: Active and maintained
提供机构:
ThanniruVenkata



