Enhancing Real-Time Video Surveillance: Face Detection and Tracking with Convolutional Neural Networks by  Monika

IRJMETS  Monika

Paper Key : IRJ************629

Author: Monika

Date Published: 05 Jul 2024

Abstract

ABSTRACTFace detection and monitoring play a crucial role in a number of computer vision applications including surveillance, human-computer interaction, and augmented reality. The Convolutional Neural Networks to develop a method for face detection and tracking in real- time that is both efficient and accurate. Face alignment, occlusion, and shape illumination issues must be accurately interpreted for human emotion recognition from videos. Convolutional Neural Networks, a powerful class of deep learning models, have achieved outstanding results in image and video processing applications like entity recognition and tracing. CNNs are designed to autonomously learn and derive features from unprocessed input data, such as images and videos, via a sequence of convolutional and pooling operations. Face mask detection has numerous applications, including biometrics, real-time surveillance, etc. Automatic face mask detection and monitoring systems are a far superior option for managing public behavior and contributing to containing the COVID-19 outbreak than staff surveillance. Face mask detection is also advantageous for public surveillance in order to prevent the use of face masks in public areas. The RILFD dataset was created by capturing actual images with a camera and annotating them with two publicly accessible labels: with mask and without mask. In this study, machine learning models and pre-trained deep learning models YOLOv3 and Faster R-CNN are utilized to detect face veils. Face mask detection is proposed using four image processing stages and personalized CNN models. The method outperforms other face mask detection models and has demonstrated its robustness with a 98.5% accuracy score on the RILFD dataset and two publicly available datasets MAFA and MOXA.Keywords: Face detection, Face monitoring, Computer vision applications, Convolutional Neural Networks, Human emotion recognition, Face mask detection, YOLOv3, Faster R- CNN. 1.INTRODUCTIONThe convolutional Neural Networks for real-time face detection and tracking is a ground breaking use of computer vision technology that has transformed numerous industries, from surveillance and security to entertainment and user experience. A wide range of applications, including video analysis, augmented reality, and biometric security, are made possible by this method, which harnesses the power of deep learning to automatically recognise and track human faces in real-time video streams. The core computer vision tasks of face detection and tracking have a wide range of practical applications. Traditional approaches struggled with differences in lighting, stances, and occlusions and relied on handcrafted features and intricate algorithms to recognize and follow faces. However, the accuracy and efficiency of these jobs have substantially improved with the introduction of CNNs and their capacity to automatically learn organizational features from raw data. Convolutional neural networks are networks that are known as of deep learning models designed specifically to process and evaluate grid-like data, such as images. They include several layers, including fully linked layers for classification and convolutional layers that extract features. CNNshierarchical structure makes it possible for them to recognize complex patterns and features in images, which makes them ideal for tasks like monitoring and recognition of faces. Real-time facedetection and tracking involves detecting faces in each frame of a video feed and preserving the identity of the detected faces across frames. The requires both precision and velocity, as video broadcasts are typically processed in real-time or close to real-time. Modern CNN- based architectures face the formidable challenge of achieving this equilibrium between precision and speed. Popular architectures such as YOLO and SSD Single Shot MultiBox Detector have become more popular as a result of their capacity to handle video frames rapidly and forecast bounding box coordinates and class probabilities in a single pass. Anchors or default boxes, which are predefined bounding boxes with varying sizes and aspect ratios, aid in the precise localization of features. The deployment of monitoring and recognition of faces system in real time has extensive ramifications. Its aides in the identification of persons of interest from live camera feeds in security and surveillance. It improves user experiences in the entertainment industry by facilitating augmented reality applications that superimpose digital elements on real-world faces. Additionally, developments in this discipline contribute to the study of human-computer interaction, emotion recognition, and social robotics. Face recognition technology is commonly used to activate smartphones which are used by the majority of people. The technology provides an effective method for protecting personal information and ensuring that sensitive information remains inaccessible to criminals even if the phone is stolen. . There are numerous applications for face recognition technology, including safety, security, and payments. Face recognition refers to the challenge of accurately recognising or authenticating a person from a digital image or video frame using the biometric pattern of their face. To authenticate a person, the system collects a unique set of biometric data points associated with facial expressions. Biometrics is used by a system that recognizes facial features to map facial traits from a picture or video. Facial recognition is a technology-based method for human face recognition and matching identification in which the system compares information to a database of known features and can assist in establishing an individual's identity 1. The faces that the camera captured are used as input before the Haar Cascade detection of faces algorithm is applied to the image. After that, feature values are extracted using the Facial Landmarks Detection approach and when compared to a feature database that has undergone SVM training earlier. The facial recognition is the consequence of software development 2. Face recognition is accomplished using a multi-task convolutional neural network. The appearance, motion, and shape characteristics are utilized for tracking to compensate for tracking failures caused by object occlusion or rapid object movement. The relative weights of various characteristics for feature fusion are modified based on the tracking condition. The approach of modifying feature weights based on scenes is capable of addressing issues such as continuous tracking, interruptible tracking, and object interaction 3. Biometric recognition, such as fingerprint recognition, palm recognition, voiceprint recognition, and retinal recognition, has been implemented in numerous attendance systems due to the accelerated development of artificial intelligence and machine learning. To accomplish the effect of access control, biological differences between individuals are used as the basis for discrimination 4. Using CNNs, computer vision tasks like picture categorization and facial recognition have been successfully completed 5-6. The image segmentation era of artificial intelligence deep architecture, GPU computation, and large training datasets are primarily responsible for the success of AI. As a result, advancements have been made in face recognition 7-8. Currently, computers can already outperform humans in these areas 9. CNNs are an artificial neural network type that are frequently employed for the recognition of tasks and classification of images. It is designed to identify patterns and attributes in images and is based on the structure of the human visual system. 10. Figure 1: CNN Architecture sources: Madhuri et al. (2023)1.Convolutional Layer:The convolutional layer is a CNN's main structural component. In order to extract features, convolutions are applied to the input image. Each convolutional layer consists of multiple filters also called kernels, which are small-sized matrices. To calculate the product of dots over the filter and the surrounding region of the picture, the filters "convolve" or slide across the input vision. By doing this function, the network is able to recognize a variety of patterns and elements in the image, including edges, textures, and more intricate structures.2.Pooling Layer:The feature mapsgeographic extent is shrunk, but maintaining essential details by using pooling layers. Basic methods of pooling include maximum and average pooling. Pooling makes the network less computationally complex and more resistant to translations and distortions.3.Activation Functions:The network acquires non-linearity through activation functions, allowing it to recognise complex input patterns. Rectified Linear Activation (ReLU) is commonly used in CNNs, but there are other options like sigmoid and tanh.4.Fully Connected Layers:Fully linked layers are included to do advanced classification and reasoning after features are extracted using convolution and pooling. These layers take flattened feature maps from the previous layers and connect every node to every node in the subsequent layer, just like in traditional artificial neural networks.5.Output Layer:The network's estimations are generated by the final layer using the features that have been learned throughout the network. The task's specific requirements determine how the output layer is activated. As an illustration, categorization jobs frequently employ SoftMax.Face mask detection has various applications in real-world settings, such as real-time authentication and virtual watching over people. Criminals frequently conceal the area of their faces around their mouths. The top of the head and shoulders of a person serve as a workaround for the problem of recognizing obstructed faces, according to researchers 11. When a big number of people must be remotely checked for face mask usage, the task becomes more challenging. The novel coronavirus infection COVID-2019 epidemic also necessitated the usage of face masks and imposed many other restrictions. Due to its severe consequences, rapid spread, lack of proper medication, and lack of medical personnel, The WHO declared COVID-19 a pandemic and recommended several preventative actions, including the use of face masks 12, 13. Wearing a mask is your best line of defence against the deadly conditions caused by COVID-19. The general public now accepts the idea that wearing a face mask can help to contain the propagation of COVID-19. The general people were under pressure all around the world to keep their distance and take precautions to halt the spread of this contagious sickness.COVID-19 continues to infiltrate and spread because of its evolving forms despite intensive immunization campaigns in numerous countries. Therefore, constant use of face masks is necessary to stop its spread. Additionally, that will help to stop exposure and prevent people from coming into contact with the disease's germs. Face masks are a requirement in multiple nations, public buildings forbid admission without one. Due to the high volume of people that enter public buildings such that airports, train stations, and retail Centers, manual examination is almost impossible. Recently, there has been increased interest in the study of automatic face covering detection and identification. For applications in monitoring and surveillance, the COVID-19 has started to build automatic detection systems 14. Since it is crucial to first recognise images before determining whether or not they are wearing masks, identification and classification are needed for the detection of face masks. As a result of the community's development of numerous face identification systems, the first objective has received considerable attention in the field of computer vision 1517. The detection of face coverings on multiple datasets required a significant amount of work. The complexity of face masks in different hues and styles, as well as the lack of a publicly accessible real-world image database, both impede existing research 18. The difficulty posed by the large variety of face mask hues and ornamentation techniques, as well as the lack of a publicly accessible real- image dataset, limit the current research. If face mask simulations are utilised instead of non- facial mask pictures in simulated datasets, the models are no longer appropriate for use in real-world settings. Many various face masks and expressions make detection more difficult.

Paper File to download :

INTERNATIONAL RESEARCH JOURNAL OF MODERNIZATION IN ENGINEERING TECHNOLOGY AND SCIENCE

Enhancing Real-Time Video Surveillance: Face Detection and Tracking with Convolutional Neural Networks