dvr76/india-synthetic-property-maintenance-tickets
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dvr76/india-synthetic-property-maintenance-tickets
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- hi
license: mit
pretty_name: Indian Synthetic Property Maintenance Ticket Triage
tags:
- maintenance
- ticket-triage
- structured-extraction
- property-management
- hinglish
- qlora
- sft
size_categories:
- 1K<n<10K
task_categories:
- text-generation
- text-classification
configs:
- config_name: default
data_files:
- split: train
path: train.jsonl
- split: validation
path: validation.jsonl
- split: test
path: test.jsonl
dataset_info:
features:
- name: ticket_text
dtype: string
- name: is_maintenance_request
dtype: bool
- name: issues
list:
- name: category
dtype: string
- name: sub_category
dtype: string
- name: location
dtype: string
- name: urgency
dtype: string
- name: vendor_type
dtype: string
- name: entry_required
dtype: bool
---
# Indian Synthetic Property Maintenance Ticket Triage
A dataset of property maintenance tickets in English and Hinglish (Hindi-English code-mixed), annotated with structured triage labels for supervised fine-tuning of LLMs.
## Dataset Description
This dataset contains tenant-submitted maintenance tickets from residential properties. Each ticket is labeled with structured fields that a triage system needs to extract: issue classification, urgency level, vendor routing, and entry requirements.
The dataset was created to fine-tune LLMs (specifically Qwen3-0.6B via QLoRA) for automated maintenance ticket triage in property management systems.
### Languages
- English
- Hinglish (Hindi-English code-mixed, written in Latin script)
### Source
Tickets were manually authored and annotated to cover a realistic distribution of maintenance scenarios including plumbing, electrical, structural, appliance, and pest-related issues.
## Dataset Structure
### Data Fields
| Field | Type | Description |
|---|---|---|
| `ticket_text` | string | Raw tenant-submitted maintenance ticket text. This is the model input. |
| `is_maintenance_request` | bool | Whether the ticket describes an actual maintenance issue (true) or is an unrelated query like rent or amenities (false). |
| `issues` | list of objects | One or more issues extracted from the ticket. Each contains the four fields below. |
| `issues[].category` | string | Top-level issue type. One of: `plumbing`, `electrical`, `structural`, `appliance`, `pest_control`, `general`. |
| `issues[].sub_category` | string | Specific issue. Examples: `leaking_tap`, `pipe_leak`, `burning_smell_switchboard`, `dampness_mold`, `water_seepage`. |
| `issues[].location` | string | Where in the unit. Examples: `kitchen`, `bathroom`, `bedroom`, `hall`, `balcony`. |
| `issues[].urgency` | string | One of: `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`. |
| `vendor_type` | string | Professional to dispatch. One of: `licensed_plumber`, `electrician`, `civil_contractor`, `general_handyman`, `pest_control_service`, `appliance_technician`. |
| `entry_required` | bool | Whether physical entry into the tenant's unit is needed for inspection or repair. |
### Data Splits
| Split | Purpose | Description |
|---|---|---|
| `train` | Model training | Used for supervised fine-tuning. |
| `validation` | Hyperparameter tuning | Evaluated every N steps during training to detect overfitting. |
| `test` | Final evaluation | Held out entirely. Used only after training is complete. |
### Example
```json
{
"ticket_text": "kitchen wall near the window has some greenish patches, started small but slowly spreading over last few weeks",
"is_maintenance_request": true,
"issues": [
{
"category": "structural",
"sub_category": "dampness_mold",
"location": "kitchen",
"urgency": "MEDIUM"
}
],
"vendor_type": "civil_contractor",
"entry_required": true
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("dvr76/india-synthetic-property-maintenance-tickets")
# Access splits
train = dataset["train"]
val = dataset["validation"]
test = dataset["test"]
# Print first example
print(train[0])
```
## Intended Use
- Fine-tuning LLMs for structured information extraction from maintenance tickets.
- Building automated triage and dispatch systems for property management.
- Benchmarking structured extraction models on informal, multilingual text.
## Limitations
- The dataset is relatively small. Models fine-tuned on it may not generalize to maintenance domains outside residential property (e.g., industrial, commercial).
- Hinglish examples use Latin script only. Devanagari Hindi is not represented.
- Urgency labels are subjective and based on the annotator's judgment. Different property managers may assign different urgency levels to the same issue.
- The dataset does not cover every possible maintenance category. Edge cases like fire damage, flooding, or structural collapse are underrepresented.
## Fine-Tuned Model
A Qwen3-0.6B model fine-tuned on this dataset is available at: dvr76/ticket-triage-qwen3
## License
MIT
提供机构:
dvr76



