SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune

Name: SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune
Creator: SalihHub
Published: 2026-04-27 13:17:18
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - image-to-text - question-answering - visual-question-answering language: - tr license_name: cc-by-4.0 pretty_name: Blind Assist TR - Indoor Scene Image-to-Text & QA tags: - blind-assist - indoor-navigation - image-captioning - visual-question-answering - turkish - unsloth-finetune - multimodal --- # Blind Assist TR - Indoor Scene Image-to-Text & QA Dataset ## Dataset Description This dataset is designed for training multimodal models to assist visually impaired people in navigating indoor environments. It contains **59,035 image-text pairs** covering **57 distinct indoor scene categories**. Each sample consists of: - **image**: An indoor scene photograph - **text**: A Turkish question-answer pair about the scene (e.g., obstacle detection, floor texture, object location, navigation guidance) ### Use Cases - **Blind assist / indoor navigation assistant** — helping visually impaired users understand their surroundings - **Image captioning** (Turkish) - **Visual question answering** (Turkish) - **Unsloth multi-modal fine-tuning** (suitable for `simple_image_text` format) ## Dataset Statistics | Statistic | Value | |------|---| | Total samples | 59,035 | | Train split | 56,083 | | Test split | 2,952 | | Train size | 6.52 GB | | Test size | 310.7 MB | | Download size | 6.20 GB | | Total dataset size | 6.82 GB | | Image-text format | `simple_image_text` (question + answer merged into single text field) | | Language | Turkish (tr) | | Categories | 57 indoor scene types | ## Category List (57) airport_inside, artstudio, auditorium, bakery, bar, bathroom, bedroom, bookstore, bowling, buffet, casino, children_room, church_inside, classroom, cloister, closet, clothingstore, computerroom, concert_hall, corridor, deli, dentaloffice, dining_room, elevator, fastfood_restaurant, florist, gameroom, garage, greenhouse, grocerystore, gym, hairsalon, hospitalroom, inside_bus, inside_subway, jewelleryshop, kindergarden, kitchen, laboratorywet, laundromat, library, livingroom, lobby, locker_room, mall, meeting_room, movietheater, museum, nursery, office, operating_room, pantry, poolinside, prisoncell, restaurant_kitchen, restaurant, shoeshop, stairscase, studiomusic, subway, toystore, trainstation, tv_studio, videostore, waitingroom, warehouse, winecellar ## Dataset Info (dataset_info.yaml) ```yaml dataset_info: features: - name: image dtype: image - name: text dtype: string splits: - name: train num_bytes: 6991630802 num_examples: 56083 - name: test num_bytes: 325942342 num_examples: 2952 download_size: 6661977554 dataset_size: 7317573144 config: config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* ``` ## Dataset Structure ``` { "image": <PIL.Image>, "text": "Question: < турец soru>\nAnswer: < турец cevap>" } ``` ### Example ```json { "image": "...", "text": "Question: Zemin üzerinde herhangi bir engel var mı?\nAnswer: Zemin açık renkli ahşap parkeden oluşuyor. Yüzey düz ve pürüzsüz görünüyor, şu an için belirgin bir engel tespit edilmiyor." } ``` ## Data Collection - Images sourced from public indoor scene datasets (ImageNet, CLEVR, etc.) - QA pairs generated via LLM-based annotation with human review - Images are augmented (random crops, brightness, color adjustments) for robustness - Split: 95% train / 5% test (seed=42) ## License This dataset is licensed under **CC-BY 4.0** (Creative Commons Attribution 4.0 International). You are free to: - **Share** — copy and redistribute the material in any medium or format - **Adapt** — remix, transform, and build upon the material for any purpose, even commercially Under the following terms: - **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. See full license: https://creativecommons.org/licenses/by/4.0/ ## Usage ```python from datasets import load_dataset dataset = load_dataset("SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune") # Access splits train = dataset["train"] test = dataset["test"] # Example print(train[0]["image"]) print(train[0]["text"]) ``` ## For Unsloth Fine-tuning This dataset is pre-formatted for `simple_image_text` task type used in Unsloth: ```python messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": train[idx]["text"]} ]}, {"role": "assistant", "content": [ {"type": "text", "text": "..." } ]} ] ``` ## Citation If you use this dataset in your research, please cite: ```bibtex @misc{blindassisttr2026, title={Blind Assist TR - Indoor Scene Image-to-Text \& QA Dataset}, author={Hub, Salih}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/SalihHub/blind-assist-tr-image-to-text_and_QA_suitable_UnslothFinetune} } ``` ## Contact Created by [SalihHub](https://huggingface.co/SalihHub)

提供机构：

SalihHub

5,000+

优质数据集

54 个

任务类型

进入经典数据集