five

dvr76/india-synthetic-property-maintenance-tickets

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dvr76/india-synthetic-property-maintenance-tickets
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en - hi license: mit pretty_name: Indian Synthetic Property Maintenance Ticket Triage tags: - maintenance - ticket-triage - structured-extraction - property-management - hinglish - qlora - sft size_categories: - 1K<n<10K task_categories: - text-generation - text-classification configs: - config_name: default data_files: - split: train path: train.jsonl - split: validation path: validation.jsonl - split: test path: test.jsonl dataset_info: features: - name: ticket_text dtype: string - name: is_maintenance_request dtype: bool - name: issues list: - name: category dtype: string - name: sub_category dtype: string - name: location dtype: string - name: urgency dtype: string - name: vendor_type dtype: string - name: entry_required dtype: bool --- # Indian Synthetic Property Maintenance Ticket Triage A dataset of property maintenance tickets in English and Hinglish (Hindi-English code-mixed), annotated with structured triage labels for supervised fine-tuning of LLMs. ## Dataset Description This dataset contains tenant-submitted maintenance tickets from residential properties. Each ticket is labeled with structured fields that a triage system needs to extract: issue classification, urgency level, vendor routing, and entry requirements. The dataset was created to fine-tune LLMs (specifically Qwen3-0.6B via QLoRA) for automated maintenance ticket triage in property management systems. ### Languages - English - Hinglish (Hindi-English code-mixed, written in Latin script) ### Source Tickets were manually authored and annotated to cover a realistic distribution of maintenance scenarios including plumbing, electrical, structural, appliance, and pest-related issues. ## Dataset Structure ### Data Fields | Field | Type | Description | |---|---|---| | `ticket_text` | string | Raw tenant-submitted maintenance ticket text. This is the model input. | | `is_maintenance_request` | bool | Whether the ticket describes an actual maintenance issue (true) or is an unrelated query like rent or amenities (false). | | `issues` | list of objects | One or more issues extracted from the ticket. Each contains the four fields below. | | `issues[].category` | string | Top-level issue type. One of: `plumbing`, `electrical`, `structural`, `appliance`, `pest_control`, `general`. | | `issues[].sub_category` | string | Specific issue. Examples: `leaking_tap`, `pipe_leak`, `burning_smell_switchboard`, `dampness_mold`, `water_seepage`. | | `issues[].location` | string | Where in the unit. Examples: `kitchen`, `bathroom`, `bedroom`, `hall`, `balcony`. | | `issues[].urgency` | string | One of: `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`. | | `vendor_type` | string | Professional to dispatch. One of: `licensed_plumber`, `electrician`, `civil_contractor`, `general_handyman`, `pest_control_service`, `appliance_technician`. | | `entry_required` | bool | Whether physical entry into the tenant's unit is needed for inspection or repair. | ### Data Splits | Split | Purpose | Description | |---|---|---| | `train` | Model training | Used for supervised fine-tuning. | | `validation` | Hyperparameter tuning | Evaluated every N steps during training to detect overfitting. | | `test` | Final evaluation | Held out entirely. Used only after training is complete. | ### Example ```json { "ticket_text": "kitchen wall near the window has some greenish patches, started small but slowly spreading over last few weeks", "is_maintenance_request": true, "issues": [ { "category": "structural", "sub_category": "dampness_mold", "location": "kitchen", "urgency": "MEDIUM" } ], "vendor_type": "civil_contractor", "entry_required": true } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("dvr76/india-synthetic-property-maintenance-tickets") # Access splits train = dataset["train"] val = dataset["validation"] test = dataset["test"] # Print first example print(train[0]) ``` ## Intended Use - Fine-tuning LLMs for structured information extraction from maintenance tickets. - Building automated triage and dispatch systems for property management. - Benchmarking structured extraction models on informal, multilingual text. ## Limitations - The dataset is relatively small. Models fine-tuned on it may not generalize to maintenance domains outside residential property (e.g., industrial, commercial). - Hinglish examples use Latin script only. Devanagari Hindi is not represented. - Urgency labels are subjective and based on the annotator's judgment. Different property managers may assign different urgency levels to the same issue. - The dataset does not cover every possible maintenance category. Edge cases like fire damage, flooding, or structural collapse are underrepresented. ## Fine-Tuned Model A Qwen3-0.6B model fine-tuned on this dataset is available at: dvr76/ticket-triage-qwen3 ## License MIT
提供机构:
dvr76
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作