five

ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # Income Tax Act 2025 (India) - Machine-Readable Legal Text Dataset ## Overview This dataset contains the complete text and structured information from India's **Income-Tax Act, 2025** (effective from April 1, 2026). It provides comprehensive coverage of all chapters and sections of the act, formatted as JSON for easy parsing and analysis by AI/ML systems. ## Dataset Description The Income Tax Act, 2025 is the primary legislation governing income taxation in India. This dataset extracts and structures all sections, chapters, and provisions into a machine-readable format for research, legal analysis, tax compliance, and AI/ML applications including semantic search, knowledge extraction, and domain-specific model training. ## Dataset Size & Format - **Format**: JSON (Line-delimited) - **Content**: Complete Act with all chapters and sections - **Language**: English - **Encoding**: UTF-8 - **Total Entries**: Comprehensive coverage of all Income Tax Act, 2025 sections ## Data Structure Each entry in the dataset contains the following fields: | Field | Type | Description | |-------|------|-------------| | `act_name` | string | Name of the act (e.g., "Income-Tax Act, 2025") | | `act_code` | string | Code identifier for the act (e.g., "ITA2025") | | `effective_from` | string | Effective date in YYYY-MM-DD format | | `chapter` | string | Chapter number of the act | | `section` | string | Section number within the chapter | | `title` | string | Title/heading of the section | | `content` | string | Full text content of the section with legal language | | `search_text` | string | Concatenated searchable text including all metadata | | `chapter_name` | string | Descriptive name of the chapter | | `chapter_subtype` | string | Subcategory or classification of the chapter | | `doc_id` | string | Unique document identifier (ITA2025_[SECTION]_[CHUNK]) | | `chunk_index` | integer | Index of chunk (0-based) for multi-chunk documents | | `total_chunks` | integer | Total number of chunks for the document | ## Content Coverage The dataset includes all major chapters of the Income Tax Act, 2025: - **Chapter 1-4**: Income Classification and Heads of Income - **Chapter 5**: Income of Other Persons Included - **Chapter 6**: Aggregation of Income - **Chapter 7**: Set Off and Carry Forward of Losses - **Chapter 8+**: Deductions, Allowances, and Advanced Provisions ## Use Cases - **Legal Research**: Search and analyze specific tax provisions and precedents - **Tax Compliance**: Understand applicability of sections to different taxpayers - **AI/ML Training**: Build domain-specific NLP models for tax law interpretation - **Educational Tools**: Develop interactive tax education and training platforms - **Document Retrieval**: Create semantic search systems for tax provisions - **Knowledge Graphs**: Extract relationships and dependencies between sections - **Tax Chatbots**: Power conversational AI for tax inquiries - **Compliance Automation**: Automate tax provision applicability checking ## How to Use ### Load with Hugging Face Datasets ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text") # Access specific sections for example in dataset['train']: if example['section'] == '100': print(example['content']) ``` ### Filter by Chapter ```python # Filter sections from a specific chapter filtered = [ex for ex in dataset['train'] if ex['chapter'] == '6'] print(f"Found {len(filtered)} sections in Chapter 6: {filtered[0]['chapter_name']}") ``` ### Search by Keyword ```python # Search for specific provisions results = [ex for ex in dataset['train'] if 'unexplained' in ex['search_text'].lower()] for result in results: print(f"Section {result['section']}: {result['title']}") ``` ### Convert to Pandas DataFrame ```python import pandas as pd dataset = load_dataset("ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text") df = pd.DataFrame(dataset['train']) # Analyze provisions by chapter chapter_stats = df.groupby('chapter_name').size() print(chapter_stats) ``` ## Dataset Splits - **Single split**: `train` (entire dataset as one comprehensive split for all analyses) ## Field Descriptions ### Identifiers & Metadata - **doc_id**: Unique identifier in format `ITA2025_[SECTION_NUMBER]_[CHUNK_INDEX]` - **act_code**: Standardized code for the act (`ITA2025`) - **effective_from**: Date when the act came into force (`2026-04-01`) ### Content Fields - **content**: Primary text of the section containing full legal language and provisions - **search_text**: Enhanced text including metadata for semantic searching and NLP - **title**: Section heading for quick reference and categorization ### Organization Fields - **chapter**: Chapter number (e.g., "5", "6", "7") - **section**: Section number within chapter (e.g., "100", "102", "104") - **chapter_name**: Descriptive name of the chapter (e.g., "AGGREGATION OF INCOME") - **chapter_subtype**: Subcategory of chapter for hierarchical organization ### Chunking Information - **chunk_index**: Which part of a multi-part section (0-based indexing) - **total_chunks**: Total number of chunks section is divided into ## Licensing & Attribution - **Source**: Income-Tax Act, 2025 (Government of India - Ministry of Finance) - **Public Domain**: Government legislative documents are in public domain in India - **Attribution**: Please cite Government of India and this dataset - **Terms**: Complies with GOI open data and public domain policies ## Data Quality - ✅ Comprehensive coverage of all sections and chapters - ✅ Properly formatted JSON with valid UTF-8 encoding - ✅ Structured metadata for filtering and hierarchical retrieval - ✅ Searchable text field optimized for semantic analysis - ✅ Unique identifiers for precise referencing and linking - ✅ Consistent formatting across all entries ## Known Limitations - Text is current as of the effective date (April 1, 2026) - Does not include subsequent amendments after April 2026 - Does not include case law, judicial interpretations, or precedents - Supplementary rules and notifications not included - No provisions from older acts (prior Income Tax Acts not included) ## Applications & Examples ### 1. Tax Advisory Systems Build AI assistants that answer tax-related queries using semantic search and retrieval ### 2. Compliance Checking Automate verification of tax provision applicability for specific business scenarios ### 3. Legal Document Analysis Extract and match relevant sections in tax-related contracts and agreements ### 4. Educational Platforms Power interactive learning tools and tutorials for taxation studies ### 5. Research & Analytics Analyze tax provisions across sections and identify policy patterns ### 6. Document Retrieval Systems Implement semantic search over Indian tax law for precise provision discovery ### 7. Tax Chatbots Train conversational AI to answer taxpayer questions about specific provisions ## Related Resources - [Income-Tax Department (India)](https://www.incometaxindia.gov.in/) - [Ministry of Finance - India](https://www.finance.gov.in/) - [Income Tax Act Official Portal](https://www.incometaxindia.gov.in/indian-income-tax-act/) - [Finance Acts & Amendments](https://www.finance.gov.in/tamil/en/web/mof) ## Citation If you use this dataset in your research, projects, publications, or work, please cite: ```bibtex @dataset{income_tax_act_2025, title={Income Tax Act 2025 (India) - Machine-Readable Legal Text Dataset}, author={Government of India, Ministry of Finance}, year={2026}, publisher={Hugging Face Datasets}, url={https://huggingface.co/datasets/ThanniruVenkata/Income-Tax-Act-2025-Machine-Readable-Legal-Text}, note={Effective from April 1, 2026} } ``` ## Contributing Found errors or have suggestions? - Open an issue for corrections or improvements - Submit pull requests for enhancements - Suggest additional metadata or features - Report any formatting or encoding issues ## Disclaimer **Important Legal Notice**: This dataset is provided for informational and research purposes only. - **Not Legal Advice**: This should not be relied upon as the sole source for tax compliance or legal advice - **Professional Consultation**: Always consult qualified tax professionals, chartered accountants, and official government sources - **Accuracy**: While efforts are made to ensure accuracy, Government of India's official sources remain authoritative - **Changes**: Tax laws are subject to amendments; always verify current applicability - **Liability**: Users are responsible for verifying information and consulting appropriate professionals - **Compliance**: Ensure compliance with applicable tax laws through official channels --- **Dataset Version**: 1.0 **Release Date**: April 2026 **Last Updated**: April 20, 2026 **Maintained by**: Data Contributors & Hugging Face Community **License**: Public Domain (Government of India) **Status**: Active and maintained
提供机构:
ThanniruVenkata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作