Paper Key : IRJ************070
Author: Prof. Shila Pawar,Omkar Walke,Kunal Rupnar,Onkar Sable,Gaurav Sonawane
Date Published: 11 Nov 2024
Abstract
Object recognition, a key advancement in computer vision and machine learning, plays a crucial role in identifying and pinpointing objects within an image or video. Through object recognition, we detect individual items and accurately obtain details about them, such as their dimensions, form, and position. This study introduced an affordable assistive system designed for obstacle recognition and environmental representation to support blind individuals using deep learning methods. The TensorFlow object recognition API and SSDLite MobileNetV2 were utilized to develop the proposed recognition model. The pre-trained SSDLite MobileNetV2 model was trained on the COCO dataset, containing nearly 328,000 images across 90 distinct object categories. The gradient particle swarm optimization (PSO) algorithm was applied in this work to fine-tune the final layers and their respective hyperparameters of the MobileNetV2 model. Subsequently, the Google text-to-speech API, PyAudio, playsound, and speech recognition were incorporated to produce audio feedback of the detected items. A Raspberry Pi camera captures real-time video, where frame-by-frame object detection is performed with a Raspberry Pi 4B microcontroller. The suggested device is embedded into a head-mounted unit, intended to assist visually impaired individuals by detecting obstacles in their path, offering a more effective solution compared to a conventional white cane. In addition to the object detection model, a secondary computer vision model, named ambiance mode, was trained. In this mode, the final three convolutional layers of SSDLite MobileNetV2 were retrained via transfer learning on a weather-related dataset. This dataset includes approximately 500 images across four weather categories: cloudy, rainy, foggy, and sunrise. In ambiance mode, the system provides a detailed narration of the surrounding environment, resembling how a human might describe a landscape or a stunning sunset to someone with visual impairment. The performance of both the object detection and ambiance description functions was evaluated on both desktop and Raspberry Pi systems. The model's accuracy was assessed using metrics like mean average precision, frame rate, confusion matrix, and ROC curve. This affordable system is expected to be a valuable tool for enhancing the daily lives of visually impaired individuals.
DOI Requested