dvr76/india-synthetic-property-maintenance-tickets

Name: dvr76/india-synthetic-property-maintenance-tickets
Creator: dvr76
Published: 2026-03-21 12:44:20
License: 暂无描述

Hugging Face2026-03-21 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/dvr76/india-synthetic-property-maintenance-tickets

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - hi license: mit pretty_name: Indian Synthetic Property Maintenance Ticket Triage tags: - maintenance - ticket-triage - structured-extraction - property-management - hinglish - qlora - sft size_categories: - 1K<n<10K task_categories: - text-generation - text-classification configs: - config_name: default data_files: - split: train path: train.jsonl - split: validation path: validation.jsonl - split: test path: test.jsonl dataset_info: features: - name: ticket_text dtype: string - name: is_maintenance_request dtype: bool - name: issues list: - name: category dtype: string - name: sub_category dtype: string - name: location dtype: string - name: urgency dtype: string - name: vendor_type dtype: string - name: entry_required dtype: bool --- # Indian Synthetic Property Maintenance Ticket Triage A dataset of property maintenance tickets in English and Hinglish (Hindi-English code-mixed), annotated with structured triage labels for supervised fine-tuning of LLMs. ## Dataset Description This dataset contains tenant-submitted maintenance tickets from residential properties. Each ticket is labeled with structured fields that a triage system needs to extract: issue classification, urgency level, vendor routing, and entry requirements. The dataset was created to fine-tune LLMs (specifically Qwen3-0.6B via QLoRA) for automated maintenance ticket triage in property management systems. ### Languages - English - Hinglish (Hindi-English code-mixed, written in Latin script) ### Source Tickets were manually authored and annotated to cover a realistic distribution of maintenance scenarios including plumbing, electrical, structural, appliance, and pest-related issues. ## Dataset Structure ### Data Fields | Field | Type | Description | |---|---|---| | `ticket_text` | string | Raw tenant-submitted maintenance ticket text. This is the model input. | | `is_maintenance_request` | bool | Whether the ticket describes an actual maintenance issue (true) or is an unrelated query like rent or amenities (false). | | `issues` | list of objects | One or more issues extracted from the ticket. Each contains the four fields below. | | `issues[].category` | string | Top-level issue type. One of: `plumbing`, `electrical`, `structural`, `appliance`, `pest_control`, `general`. | | `issues[].sub_category` | string | Specific issue. Examples: `leaking_tap`, `pipe_leak`, `burning_smell_switchboard`, `dampness_mold`, `water_seepage`. | | `issues[].location` | string | Where in the unit. Examples: `kitchen`, `bathroom`, `bedroom`, `hall`, `balcony`. | | `issues[].urgency` | string | One of: `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`. | | `vendor_type` | string | Professional to dispatch. One of: `licensed_plumber`, `electrician`, `civil_contractor`, `general_handyman`, `pest_control_service`, `appliance_technician`. | | `entry_required` | bool | Whether physical entry into the tenant's unit is needed for inspection or repair. | ### Data Splits | Split | Purpose | Description | |---|---|---| | `train` | Model training | Used for supervised fine-tuning. | | `validation` | Hyperparameter tuning | Evaluated every N steps during training to detect overfitting. | | `test` | Final evaluation | Held out entirely. Used only after training is complete. | ### Example ```json { "ticket_text": "kitchen wall near the window has some greenish patches, started small but slowly spreading over last few weeks", "is_maintenance_request": true, "issues": [ { "category": "structural", "sub_category": "dampness_mold", "location": "kitchen", "urgency": "MEDIUM" } ], "vendor_type": "civil_contractor", "entry_required": true } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("dvr76/india-synthetic-property-maintenance-tickets") # Access splits train = dataset["train"] val = dataset["validation"] test = dataset["test"] # Print first example print(train[0]) ``` ## Intended Use - Fine-tuning LLMs for structured information extraction from maintenance tickets. - Building automated triage and dispatch systems for property management. - Benchmarking structured extraction models on informal, multilingual text. ## Limitations - The dataset is relatively small. Models fine-tuned on it may not generalize to maintenance domains outside residential property (e.g., industrial, commercial). - Hinglish examples use Latin script only. Devanagari Hindi is not represented. - Urgency labels are subjective and based on the annotator's judgment. Different property managers may assign different urgency levels to the same issue. - The dataset does not cover every possible maintenance category. Edge cases like fire damage, flooding, or structural collapse are underrepresented. ## Fine-Tuned Model A Qwen3-0.6B model fine-tuned on this dataset is available at: dvr76/ticket-triage-qwen3 ## License MIT

提供机构：

dvr76

5,000+

优质数据集

54 个

任务类型

进入经典数据集