SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-to-text
- question-answering
- visual-question-answering
language:
- tr
license_name: cc-by-4.0
pretty_name: Blind Assist TR - Indoor Scene Image-to-Text & QA
tags:
- blind-assist
- indoor-navigation
- image-captioning
- visual-question-answering
- turkish
- unsloth-finetune
- multimodal
---
# Blind Assist TR - Indoor Scene Image-to-Text & QA Dataset
## Dataset Description
This dataset is designed for training multimodal models to assist visually impaired people in navigating indoor environments. It contains **59,035 image-text pairs** covering **57 distinct indoor scene categories**.
Each sample consists of:
- **image**: An indoor scene photograph
- **text**: A Turkish question-answer pair about the scene (e.g., obstacle detection, floor texture, object location, navigation guidance)
### Use Cases
- **Blind assist / indoor navigation assistant** — helping visually impaired users understand their surroundings
- **Image captioning** (Turkish)
- **Visual question answering** (Turkish)
- **Unsloth multi-modal fine-tuning** (suitable for `simple_image_text` format)
## Dataset Statistics
| Statistic | Value |
|------|---|
| Total samples | 59,035 |
| Train split | 56,083 |
| Test split | 2,952 |
| Train size | 6.52 GB |
| Test size | 310.7 MB |
| Download size | 6.20 GB |
| Total dataset size | 6.82 GB |
| Image-text format | `simple_image_text` (question + answer merged into single text field) |
| Language | Turkish (tr) |
| Categories | 57 indoor scene types |
## Category List (57)
airport_inside, artstudio, auditorium, bakery, bar, bathroom, bedroom, bookstore, bowling, buffet, casino, children_room, church_inside, classroom, cloister, closet, clothingstore, computerroom, concert_hall, corridor, deli, dentaloffice, dining_room, elevator, fastfood_restaurant, florist, gameroom, garage, greenhouse, grocerystore, gym, hairsalon, hospitalroom, inside_bus, inside_subway, jewelleryshop, kindergarden, kitchen, laboratorywet, laundromat, library, livingroom, lobby, locker_room, mall, meeting_room, movietheater, museum, nursery, office, operating_room, pantry, poolinside, prisoncell, restaurant_kitchen, restaurant, shoeshop, stairscase, studiomusic, subway, toystore, trainstation, tv_studio, videostore, waitingroom, warehouse, winecellar
## Dataset Info (dataset_info.yaml)
```yaml
dataset_info:
features:
- name: image
dtype: image
- name: text
dtype: string
splits:
- name: train
num_bytes: 6991630802
num_examples: 56083
- name: test
num_bytes: 325942342
num_examples: 2952
download_size: 6661977554
dataset_size: 7317573144
config:
config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
```
## Dataset Structure
```
{
"image": <PIL.Image>,
"text": "Question: < турец soru>\nAnswer: < турец cevap>"
}
```
### Example
```json
{
"image": "...",
"text": "Question: Zemin üzerinde herhangi bir engel var mı?\nAnswer: Zemin açık renkli ahşap parkeden oluşuyor. Yüzey düz ve pürüzsüz görünüyor, şu an için belirgin bir engel tespit edilmiyor."
}
```
## Data Collection
- Images sourced from public indoor scene datasets (ImageNet, CLEVR, etc.)
- QA pairs generated via LLM-based annotation with human review
- Images are augmented (random crops, brightness, color adjustments) for robustness
- Split: 95% train / 5% test (seed=42)
## License
This dataset is licensed under **CC-BY 4.0** (Creative Commons Attribution 4.0 International).
You are free to:
- **Share** — copy and redistribute the material in any medium or format
- **Adapt** — remix, transform, and build upon the material for any purpose, even commercially
Under the following terms:
- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
See full license: https://creativecommons.org/licenses/by/4.0/
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune")
# Access splits
train = dataset["train"]
test = dataset["test"]
# Example
print(train[0]["image"])
print(train[0]["text"])
```
## For Unsloth Fine-tuning
This dataset is pre-formatted for `simple_image_text` task type used in Unsloth:
```python
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": train[idx]["text"]}
]},
{"role": "assistant", "content": [
{"type": "text", "text": "..." }
]}
]
```
## Citation
If you use this dataset in your research, please cite:
```bibtex
@misc{blindassisttr2026,
title={Blind Assist TR - Indoor Scene Image-to-Text \& QA Dataset},
author={Hub, Salih},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/datasets/SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune}
}
```
## Contact
Created by [SalihHub](https://huggingface.co/SalihHub)
提供机构:
SalihHub



