Although a number of cost-e ective strategies have been proposed for the chemotherapy of HIV infection, the termination level of viral load and latent reservoir is barely considered. However, the viral load at the termination time is an important biomarker because suppressing viral load to below the detection limit is a major objective of current antiretroviral therapy. The pool size of latently infected cells at the termination time may also play a critical role in predicting a rapid viral rebound to the pretreatment level or post-treatment control. In this work, we formulate an optimal control problem by incorporating the termination level in terms of viral load, latently and productively infected T cells into an existing HIV model. The necessary condition for this optimal system is derived using the Pontryagin's maximum principle. Numerical analysis is carried out using Runge-Kutta 4 method for the forward-backward sweep. Our results suggest that introducing the termination viral load into the control provides a better strategy in HIV chemotherapy.
1.
Introduction
The documented annual count of road traffic fatalities amounted to 1.35 million, positioning it as the eighth leading cause of non-natural mortality worldwide across all age groups [1]. In the European Union (EU) context, although there has been a decline in recorded yearly road fatalities, the number still exceeds 40,000, with 90% of these fatalities attributed to human error. Consequently, substantial investments from international stakeholders have been directed towards supporting the advancement of autonomous vehicle (AV) technology to enhance traffic management. Moreover, the implementation of AVs is expected to aid in reducing carbon emissions, thereby contributing to the attainment of targeted objectives for carbon emission mitigation [2]. AVs, also known as self-driving vehicles, possess the transportation capabilities of traditional vehicles but exhibit advanced capabilities in perceiving their surroundings and autonomously navigating without extensive human intervention. A study conducted by Precedence Research indicates that the global market for AVs witnessed a volume of approximately 6,500 units in 2019, with a projected compound annual growth rate of 63.5% from 2020 to 2027 [3]. The development of self-driving cars can be traced back to 2009 when Google initiated a covert project that eventually became Waymo, currently a subsidiary of Alphabet, Google's parent company. In 2014, Waymo unveiled a prototype of a fully autonomous vehicle, eliminating the need for pedals and a steering wheel [4]. Notably, Waymo has achieved a significant milestone by successfully accumulating a collective mileage of over 20 million miles on public roads across 25 cities in the United States of America (USA) [5]. In the Irish context, Jaguar Land Rover (JLR) Ireland announced in 2020 its partnership with an autonomous car hub in Shannon, Ireland, where 450 kilometers of roads will be utilized to conduct testing of their next-generation AV technology [6].
In 2014, the Society of Automotive Engineers (SAE), now known as SAE International, introduced the J3016 standard titled "Levels of Driving Automation" to provide a framework for consumers. This standard outlines six distinct levels of driving automation, ranging from SAE level 0, where the driver maintains full control of the vehicle, to SAE level 5, where vehicles can autonomously handle all dynamic driving tasks without human intervention [7]. Notably, leading automobile manufacturers such as Audi (Volkswagen) and Tesla have embraced SAE level 2 automation standards in the development of their respective automation features, including Tesla's Autopilot [8] and Audi A8's Traffic Jam Pilot [9,10]. In contrast, Alphabet's Waymo has been exploring a business model since 2016 based on SAE level 4 self-driving taxi services, which are designed to operate within a limited geographic area in Arizona, USA, and generate fares [11]. Autonomous driving (AD) systems encounter a variety of shared challenges and limitations when operating in real-world scenarios. One such challenge is the ability to ensure safe driving and navigation in adverse weather conditions, as well as effectively interacting with pedestrians and other vehicles. Harsh weather conditions, including glare, snow, mist, rain, haze, and fog, can significantly impact the performance of perception-based sensors, which are crucial for accurate perception and navigation. These challenges are not limited to on-road autonomous vehicles (AVs) but also extend to other constrained AD scenarios such as parking lots which commonly have inconsistent setups such as the inconsistent making and signage that varies from one parking lot to the other. The complexity of these challenges further increases for on-road AVs due to the unpredictable conditions and behaviors exhibited by other vehicles. For instance, even the placement of a yield sign at an intersection can influence the behavior of approaching vehicles. Consequently, the inclusion of a comprehensive prediction module within AVs becomes crucial for identifying future positions and motions of all entities, thereby mitigating collision hazards [12,13]. While AD systems encounter common challenges, noticeable differences exist among them. For example, unmanned tractors operating in agriculture farms navigate in a fixed environment between crop rows, whereas on-road vehicles must navigate through dynamic environments characterized by crowded spaces and traffic flow [14]. AV systems, despite some minor variations, are characterized by their complex nature and the presence of multiple interconnected subcomponents [15]. The sensing capabilities of an autonomous vehicle (AV) play a crucial role in the overall functioning of the AD system. A diverse set of sensors is employed to enable the AV to perceive its environment, and the cooperative performance of these sensors directly influences the feasibility and safety of the vehicle [16]. The selection of an appropriate combination of sensors and their optimal configurations is a critical factor in achieving a reliable representation of the environment, mimicking the human ability to perceive and comprehend the surroundings. This aspect holds significant importance in the design and implementation of any autonomous driving (AD) system.
When selecting a group of sensors for an autonomous vehicle (AV), it is crucial to consider the advantages, disadvantages, and limitations of both smart sensors and non-smart sensors. The definition of "smart sensor" has evolved in recent years with the emergence of the Internet of Things (IoT), which refers to a network of interconnected objects capable of collecting and transmitting data wirelessly without human intervention. In the context of IoT, a smart sensor is a device that can condition input signals, process and interpret data, and make decisions without relying on a separate computer [17]. In the AV context, range sensors used for environment perception, such as cameras, LiDAR, and radars, may be considered "smart" if they provide additional information, such as target tracking and event descriptions, as part of their output. On the other hand, a "non-smart" sensor merely conditions the raw data or waveforms and requires external computing resources for processing and interpretation to extract meaningful information about the environment. A sensor is considered "smart" only when computer resources are an integral part of its physical design [18]. To enhance the overall performance of an AV system, it is beneficial to incorporate multiple sensors of different types (smart/non-smart) and modalities (visual, infrared, and radio waves), operating at various ranges and bandwidths (data rates). By combining the data from each sensor type, a fused output can be generated [17‒19]. This process of multi-sensor fusion has become essential in all AD systems as it overcomes the limitations of individual sensor types, improving the efficiency and reliability of the overall AD system.
This paper provides an in-depth overview of the current state-of-the-art techniques and algorithms utilized in image processing and sensor fusion for autonomous vehicles. It explores a range of methodologies employed for object detection, recognition, tracking, and scene understanding, focusing on computer vision and machine learning approaches. Furthermore, the paper discusses the challenges and open research areas within this field, including robustness to adverse weather conditions, real-time processing requirements, and the integration of high-dimensional sensor data. Additionally, the paper delves into the domain of localization methods in autonomous vehicles. The results highlight the significant progress that has been made in each of these subfields. However, certain limitations such as the lack of comprehensive large-scale testing, the scarcity of diverse and robust datasets, and occasional inaccuracies in specific studies pose challenges for the practical implementation of this technology in real-world scenarios. The findings of this literature review contribute to a deeper understanding of the current state and future directions of image processing and sensor fusion in autonomous vehicles. This knowledge will be valuable for researchers and practitioners involved in advancing the development of reliable autonomous driving systems.
2.
Image processing
The accurate identification of surrounding objects is a fundamental aspect in the development of autonomous vehicles. In many instances, this task is accomplished through the utilization of neural networks, which are algorithms driven by artificial intelligence designed to classify data in a manner resembling human intellect. This classification process involves training the neural network using a designated training set, where inputs are provided to the network alongside their corresponding desired outputs. Over time, a correlation is established between the inputs and outputs, enabling the neural network to learn and make accurate classifications. Subsequently, the neural network's performance is evaluated using a separate dataset known as the validation set to avoid overfitting of the neural network while trained. Then, there is the testing set that is withheld during the training phase and is employed to assess the accuracy of the neural network on unseen data. A commonly employed type of neural network in autonomous vehicles is the Convolutional Neural Network (CNN). This network assigns weights, which indicate the significance of various image components, to different regions within an image. By utilizing these weights, the neural network can effectively classify the image based on its distinctive features [20].
In the domain of autonomous vehicular navigation, a comprehensive grasp of the intricacies inherent to Convolutional Neural Networks (CNNs) holds paramount significance as it discerns dissimilar models employed to accomplish assorted tasks. The foundational architecture of CNNs emulates the human visual processing capability, harnessing hierarchical data features to perceive, categorize, and assimilate the environmental milieu. The network commences with the input layer, serving as the ingress point for raw data, notably images, into the neural framework. These images, depicted as 2D pixel arrays, undergo evaluative scrutiny within the model, contingent upon chromatic attributes and luminous intensity. Upon traversing the initial phase, the image proceeds into the convolutional layer, constituting the linchpin of the entire network. This stratum employs diminutive filters, commonly termed kernels, which iteratively traverse distinct image segments, effectuating dot product computations based on their inherent weights and chromatic data. This operation is systematically conducted across the entirety of the image, yielding a fresh cartographic representation. The quantum of convolutional strata and the typology of deployed filters exhibit variance contingent upon the specific network configuration [21]. Succeeding this stage, the ReLU activation function is invoked, thereby engendering nonlinearity and augmenting the network's acumen in discerning intricate motifs. While certain instances of CNNs omit the ReLU activation stratum, those encompassing this element evince a propensity for heightened precision [22]. Subsequent to activation, the data undergoes progression into the pooling layer, frequently denoted as the max-pooling layer. Its principal objective encompasses dimensions reduction of feature maps, all the while preserving salient information. This dimensionality reduction serves to mitigate the computational overhead entailed in neural network execution. Moreover, this strategic downscaling enhances the network's resilience to infinitesimal perturbations, thereby culminating in an amelioration of sustained accuracy. The frequency of max-pooling layer integration varies contingent upon the pertinent model. Upon traversal of the foregoing stages, the image embarks upon traversal through fully connected strata, where the pivotal process of classification transpires. Herein, antecedently derived input features precipitate final predictions, ultimately furnishing a probabilistic gauge of precision. A profuse corpus of training data is indispensably required for most neural constructs, wherein the network is acclimatized to image classifications. In the course of temporal evolution, discernible patterns emerge, concomitantly amplifying the network's proficiency in classification endeavors [21].
Light detection and ranging (LIDAR) sensors play a vital role in numerous object detection techniques employed in autonomous vehicles. This is primarily attributed to their capacity for generating intricate maps of the vehicle's surroundings and their exceptional ability to detect objects even in low-light conditions and night-time. LIDAR sensors utilize laser technology to facilitate ranging by emitting light pulses that are subsequently reflected back to the sensor. The resultant data can be accurately collected, enabling the generation of three-dimensional (3D) maps. Although less advanced approaches yield two-dimensional (2D) maps, these have become progressively less prevalent due to advancements in technology [23,24]. When employing LIDAR sensors, it is common to integrate an Inertial Measurement Unit (IMU) into the system. The IMU serves to measure changes in angle, velocity, and acceleration, thereby aiding the LIDAR sensor in the acquisition of precise data. Typically, an IMU comprises three accelerometers and three gyroscopes, which, when combined with a LIDAR sensor, enable the measurement of crucial parameters such as acceleration, distance, angular rotation, and others [25]. Among the various types of LIDAR sensors utilized in autonomous vehicles, Velodyne LIDAR stands out as one of the most popular choices due to its proven effectiveness [23]. The widespread adoption of LIDAR sensors can be attributed to their exceptional accuracy and efficiency in measuring distances, as they operate at higher frequencies compared to alternative sensors such as radar. However, it is worth noting that the limitations of LIDARs in accurately measuring data under adverse weather conditions such as fog, snow, and rain have prompted the exploration of alternative sensor systems, including radars and cameras [26]. Radar sensors, also known as Radio Detection and Ranging sensors, represent another commonly utilized sensor type in autonomous vehicles. Unlike LIDAR sensors, radar sensors rely on electromagnetic waves, as opposed to lasers, to gather data by analyzing the reflections of these waves. By utilizing this collected data, radar sensors are capable of determining the position and movement of objects [27]. An advantage of radar sensors lies in their resilience to adverse weather conditions, which makes them well-suited for autonomous vehicles as they provide a viable alternative to potentially inaccurate or less effective LIDAR sensors under such circumstances. However, it is important to note that radar sensors typically operate at lower frequencies compared to LIDAR sensors, making them more challenging to employ in real-world scenarios. Moreover, radar sensors are unable to discern differences in objects or detect color, which presents difficulties in reading street signs or perceiving obstacles [26]. Notwithstanding these challenges, radar sensors find more pervasive utility in practical operational contexts. Although the panoramic 360-degree field-of-view inherent to LiDAR sensors affords a heightened precision in the acquisition of sensory data, the substantial cost differential between these two sensor modalities remains conspicuous. Presently, a LiDAR sensor commands a price point exceeding $500, engendering a substantial dampening effect on the profitability of automotive enterprises that incorporate such technology [28]. Progressive strides within radar technology have concurrently facilitated the seamless integration of radar sensors. Their capacity to engender finely resolved point clouds has engendered a comparability with LiDAR sensors across numerous dimensions [28]. However, LiDAR's ascendancy persists on account of its operation at a higher frequency, thereby conferring superior efficacy. To achieve a heightened proliferation of LiDAR adoption, a requisite precondition necessitates a considerable reduction in its unit cost, approximating a fifty percent decrease. Experts contend that the burgeoning market for autonomous vehicles portends an eventual diminution in LiDAR sensor costs, engendering a commensurate escalation in its ubiquity across real-world applications.
On the other side, cameras represent a fundamental component in autonomous driving systems, primarily employed for object detection purposes. These cameras are often utilized in conjunction with neural networks, leveraging the power of machine learning algorithms to facilitate object recognition. Over time, through the process of training, neural networks become proficient in identifying patterns within images, enabling them to detect objects with greater ease [20]. Furthermore, image segmentation techniques are commonly applied to images during the object detection process. Image segmentation involves dividing the image into distinct "regions, " which are subsequently analyzed separately to yield more efficient results. By employing image segmentation, also known as semantic segmentation, the neural network is tasked with analyzing fewer image elements, as certain portions that are deemed insignificant can be disregarded. Consequently, the computational processing time is reduced, allowing for faster analysis [29]. Despite the considerable accuracy cameras offer, they do possess limitations that must be acknowledged. Firstly, cameras are unable to operate effectively in adverse weather conditions, impairing their functionality and reliability in challenging environmental situations. Additionally, cameras have inherent limitations in capturing a comprehensive 360-degree field of view, which can present challenges for object detection in certain scenarios. Consequently, relying solely on cameras as the exclusive sensing device for object detection in autonomous vehicles can prove challenging [26]. Hence, it is evident that each discussed detection method possesses its own set of limitations and challenges. Consequently, the integration of sensor fusion techniques, which harnesses the strengths of different sensors, enhances their overall effectiveness when compared to individual sensor systems. Sensor fusion entails the amalgamation of data from various devices, such as LIDAR sensors, radar sensors, and cameras, to construct a comprehensive model of the surrounding environment, typically represented as a point cloud. This approach proves particularly advantageous due to the unique capabilities and limitations of each sensor type, including their ability to perceive specific colors, variations in range, and disparities in data quality. Through the combination of these sensors, advancements have been achieved in the performance of autonomous vehicles [30]. The process of sensor fusion is achieved by employing software algorithms that consolidate the data from multiple sensors, generating a coherent and intelligible representation for the autonomous vehicle to utilize in its operations.
The utilization of neural networks in autonomous vehicles has gained significant traction, leading to a proliferation of research studies employing this technique. One notable example is the work of Nabati et al. [31], who employed a radar region proposal network (RRPN) for object detection in autonomous cars. This model leverages radar sensors to collect data concerning the vehicle's surroundings and generates Regions of Interest (RoI), which represent crucial areas within the image. The utilization of radar serves as a cost-effective alternative to other techniques, such as LiDAR, while maintaining a reasonable level of accuracy. The neural network processes the image data and generates object detection predictions based on these identified regions. Although this method exhibited superior accuracy compared to Selective Search, another segmentation-based algorithm commonly used for object detection, the achieved results were still suboptimal. Therefore, it is unlikely that this method can be directly applied in real-world settings without further improvements. In addition to more complex neural network architectures, simpler neural networks have also been employed for image processing tasks in autonomous vehicles. For instance, Lewis et al. [32] utilized techniques such as region proposals, as discussed previously, along with Scale-Invariant Feature Transform (SIFT) descriptors. SIFT descriptors involve generating 3D models of the surroundings to identify specific points in an image. These points are then converted into data values and used to construct a histogram based on the extracted features. SIFT descriptors prove particularly valuable for rapidly locating essential image components. Lewis et al. [32] aimed to achieve efficient object detection using a network called SimpleNet. Although the work succeeded in achieving faster processing times (0.09 s/image) compared to counterparts like Fast R-CNN (which took over 0.3 s/image), its effectiveness was relatively limited, making it impractical for real-world applications. Neural networks have also been applied to the classification of traffic signs, as demonstrated in the study by Satilmus et al. [33]. In their research, a YOLO-CNN (You Only Look Once Convolutional Neural Network) model, which features additional layers compared to a standard neural network, was employed. The model was trained using a database generated from a ZED stereo camera setup. This setup consists of two closely placed cameras, and the views captured by each camera are compared. This arrangement mimics the human visual system, where the disparity between viewpoints, depicted in a 3D point cloud, enables researchers to perceive depth information of objects. By integrating this camera setup with other sensors such as LiDAR and IMU, the study gathered extensive data points to develop a functional framework. According to the authors' reported results, the accuracy of this model is promising, reaching 99.97% accuracy. While both of these models (YOLO-CNN and SimpleNet) utilize similar architectural principles, the YOLO-CNN model includes additional layers and employs a distinct camera setup. As a result, the SimpleNet architecture achieves faster processing times, as expected from its fewer layers, but falls short in terms of accuracy compared to the YOLO-CNN model. Thus, every model offers its own benefits as YOLO-CNN offers high accuracy while SimpleNet offers faster processing times. Another application of neural networks in image processing involves the integration of camera and LiDAR sensors, as demonstrated in the work of Shen et al. [34] The objective of this research is to predict the vehicle's speed based on data collected by the sensors and analyzed by the neural network. A network architecture with three sensors mounted on the vehicle is utilized for end-to-end driving, where the input image, typically captured by a front-facing camera, can influence the car's actions. Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) are employed for these tasks. SVMs are a classification algorithm that determines a separating "line" or "plane" in a graph to classify data points. LSTM, on the other hand, is a type of neural network with enhanced memory capacity, enabling it to store more sequential data, which proves valuable for larger datasets. The researchers created a dataset by driving on roads using LiDAR and camera sensors, resulting in the collection of over 150,000 frames. The efficacy of the algorithm is evaluated based on the gap rate, and overall, the results are promising. Similarly, Gao et al. [35] utilize camera and LiDAR sensors for object detection. The study highlights that while cameras perform well under certain conditions, they require additional depth information about surrounding objects. Therefore, a fusion of RGB and LiDAR sensors is employed for object detection. The objects are initially detected using the Camera RGB image and the data provided by the LiDAR, and subsequently cropped to be input into the neural network. The data is then processed using AlexNet, another widely used neural network architecture, and classification is performed. Principal Component Analysis (PCA) is employed as a means to summarize and analyze the data. Additionally, backpropagation is utilized to automatically adjust the parameters of the neural network for improved accuracy. However, the obtained accuracy of the model is only 66%, indicating that it is not suitable for practical deployment. Although the AlexNet methodology differs significantly from the SVM/LSTM model, the commonality lies in the utilization of a combination of cameras and LiDAR sensors. From the results, it can be inferred that the SVM and LSTM model perform significantly better than the AlexNet model, as evidenced by the notable disparity in results. This difference could also be attributed to the significantly larger dataset of 150,000 frames utilized in the SVM and LSTM model, which likely contributes to its superior performance.
Another image processing technique employed in autonomous vehicles involves the utilization of a fisheye camera, as demonstrated in the study by Saez et al. [36]. A fisheye camera offers a 360-degree field of view, eliminating the need for multiple cameras within a single system. However, one challenge associated with this technique is the distortion introduced by fisheye cameras. To address this issue, Local Binary Pattern, which assigns a binary value (0 or 1) to each pixel based on its brightness level, was utilized. While this approach helps mitigate distortion, it significantly compromises the image quality. Consequently, the method in the study did not employ Local Binary Pattern. Instead, the proposed model utilized zoom augmentation, which involved adjusting the camera's focal length in the dataset to eliminate distortion. Additionally, a Gaussian distribution was implemented to determine the classification. The model employed a standard type of convolutional neural network (CNN) called ERFNet. The results presented in the paper demonstrated reasonable performance using the IoU value, which measures the difference between ground truth and predicted values on a scale between 0 to 1. The measurement, an alternative to mean average precision (mAP) [37], was reported to be 0.556, indicating poor accuracy. In the work conducted by Farag et al. [38], a behavior cloning approach was employed for autonomous driving. The methodology involved simulating driving scenarios using recorded vehicle behaviors and training a convolutional neural network (CNN) based on this data. The training data was obtained from a front-facing camera used during testing, while the steering commands, representing vehicle behaviors, were derived from the behaviors of experienced drivers in real-world traffic and urban road conditions. The movements of the vehicle were recorded, quantified, and utilized as the training dataset. To train the model, a variant of the Gradient Descent optimization algorithm was employed, which iteratively adjusted the network's parameters to optimize its performance. Various image augmentation techniques were applied to the input images, including color normalization, cropping, flipping, and adjustments for brightness and shadow. Additionally, supplementary datasets, such as the Udacity Supplied Data and Simulator Generated Data, were incorporated to enhance the diversity and quality of the training data. However, the results of the model were found to be highly ineffective. During testing, the vehicle consistently veered off the road boundaries, indicating poor performance. It is important to note that the neural network employed in this study lacked an expansive memory, limiting its ability to learn and improve over time. This limitation likely contributed to the model's poor performance and highlights the need for more advanced and memory-enhanced architectures in autonomous driving systems. Iftikhar et al. [39] employ a three-dimensional Convolutional Neural Network (3D CNN) to discern pedestrians proximate to autonomous vehicles. In their architectural configuration, the object detection components of their framework discern potential pedestrian entities, subsequently channeled through the CNN to effectuate pedestrian detection and classification. Notably, the YOLO v3 Convolutional Neural Network architecture is harnessed, imparting a semblance to analogous methodologies within the domain. The employed 3D point cloud is derived from LiDAR sensors, constituting the input stratum for the neural network. During the training regimen, diverse forms of data augmentation, inclusive of color-based enhancement and fuzzy enhancement, are indispensable. This augmentation regime engenders an upsurge in model mean Average Precision (mAP) by a maximal margin of 0.75%. The empirical investigation leverages datasets proffered by Waymo and KITTI. Performance evaluation reveals a precision metric fluctuating between 94% and 97% across distinct datasets, signifying commendable outcomes, albeit with scope for enhancement, stemming from the limited variance exhibited by the datasets employed. Gao et al. [40] present a methodology underpinned by depth and ego-motion optimization to facilitate meticulous image processing in the context of autonomous vehicles. Ego-motion pertains to the inherent movement of a camera, intrinsic to the capture of vehicular surroundings. The authors conceptualize depth estimation as a multifaceted geometric quandary, often amenable to resolution through linear mathematical paradigms. This undertaking is supplemented by the integration of Simultaneous Localization and Mapping (SLAM) algorithms for vehicular localization. Their approach integrates RGB images and their corresponding feature maps in a parallel fashion via a pose estimation network. Augmenting precision, a contour loss function fortifies object edge depth prognostications. The iterative enhancement of accuracy over temporal epochs is facilitated by the PoseNet neural architecture, which iteratively intertwines RGB images and feature maps, recurrently subjected to processing. Noteworthy is the incorporation of Rectified Linear Unit (ReLU) activation within the network architecture. While displaying adeptness in overcoming the detriments of noise, the model evinces pronounced computational complexity and a corollary loss function, collectively precluding its translatability into real-world operational contexts. Liang et al. [41] adopt the ResNeSt Convolutional Neural Network model, augmented by a coordinate attention block, to augment the region of interest detection paradigm within the model. The coordinate attention block framework endows the AI with the ability to selectively attend to salient features within data or images, a facet germane to autonomous vehicles, as it concomitantly curtails the computational demand over successive iterations. Proprietary augmentation techniques are formulated to amplify the model's aptitude for detecting proximate objects. The research endeavors encompass the creation of a dedicated dataset, which constitutes the crucible for testing. The prescribed network closely mirrors the R-CNN (Region-based Convolutional Neural Network) architecture, with pronounced emphasis on region of interest delimitation. Augmentation techniques encompassed illumination and brightness adjustment, alongside frequent image enhancement for optimal visual clarity. The discerned accuracy within controlled settings is commendable; however, the dataset's contextual robustness fails to endorse the model's practical utility within authentic real-world scenarios. Table 1 summarizes the different techniques used for image processing in autonomous driving while showing their strengths and weaknesses.
The results show that, in recent years, there has been significant progress in the development of image-processing techniques in the industry. Neural networks have emerged as the most efficient method for object classification, making them a popular choice in these studies. Additionally, the use of LiDAR sensors has proven to be highly effective in enabling vehicles to perceive their surroundings, as evidenced by the studies that incorporated LiDAR techniques and achieved the best results. However, several challenges have surfaced in these studies. One notable challenge is the requirement for a robust and comprehensive database. In some cases, models rely on remote databases due to limited computing power in the vehicles themselves. This constraint can impact the accuracy of the models, highlighting the need for better accessibility to computing resources to create adequately sized datasets. Moreover, another common issue encountered in the aforementioned studies is the lack of diversity in the training data. While the results may be promising within a specific setting, the models may not be suitable for practical use. This limitation arises from the fact that the image databases predominantly consist of data collected from a single setting or location. Consequently, the performance of these models might be compromised when tested in different regions or under varying environmental conditions, such as rain or snow. In other words, the developed techniques cannot be generalized and used across different locations and across different environmental conditions (such as rain, snow, etc.). Furthermore, the results of previous studies highlight the debate surrounding the efficiency and computational time in autonomous vehicles (as shown when YOLO-CNN and SimpleNet were compared) which revolves around finding the right balance between processing capabilities and real-time responsiveness. On one hand, autonomous vehicles require sophisticated algorithms and extensive computational power to analyze vast amounts of sensor data, make complex decisions, and ensure safe and efficient navigation. On the other hand, there is a need to minimize computational time to enable real-time decision-making and responsiveness, particularly in dynamic driving situations. The challenge lies in achieving a trade-off between computational complexity and real-time responsiveness. While advanced algorithms and models may deliver higher accuracy, they can be computationally demanding and may introduce processing delays. Striking the right balance is crucial to ensure that the autonomous system can make timely decisions and react to dynamic changes in the environment.
3.
Image processing in autonomous parking
Numerous scholarly works have extensively examined the technological aspects of autonomous driving; however, the realm of autonomous parking presents distinctive computational challenges. Unlike the task of recognizing street signs and road markings, autonomous parking necessitates the identification of available and occupied parking spaces within a designated parking lot. Moreover, the dynamics of vehicles in parking lots significantly contrast with those on high-speed roadways, as parking maneuvers are characterized by slower speeds. Consequently, the field of autonomous parking primarily emphasizes the attainment of precise and efficient parking within designated areas. Although there are disparities in their respective applications, both autonomous driving and autonomous parking systems commonly employ neural networks as their underlying technological framework. Nevertheless, the implementation of distinct methodologies within neural networks is imperative to achieve favorable outcomes in the context of autonomous parking.
Zhu et al. [42] employ crowdsourcing as a means to gather pertinent data pertaining to available parking spaces in the vicinity, with the objective of enhancing parking efficiency through their proposed model. This approach aims to provide assistance in the process of parking vehicles within the designated area. The study incorporates the utilization of vehicular fog computing (VFC) as a fundamental framework. VFC harnesses the computational resources of the vehicle to facilitate communication, data collection, and data processing with other vehicles. To achieve this, specific components of the vehicle's computational system function as "fog nodes", which gather the requisite information and disseminate it to the cloud. This arrangement fosters an efficient system of communication among vehicles for the purpose of parking. The methodology involves the capture of frames by dash cameras, which are subsequently processed using the aforementioned approach. The evaluation of the proposed method is conducted using a metric termed the "jam factor", which provides an estimation of network quality and travel efficiency within the parking lot. A higher jam factor signifies increased congestion in the parking lot. Based on the outcomes presented in the research paper, it can be deduced that the jam factor exhibits a decline over time, thereby indicating that the model increases its effectiveness over time. In the study conducted by Shen et al. [34], an endeavor was made to anticipate the movement and trajectory of a vehicle by leveraging a CNN-LSTM architecture, similar to the model employed in the work of Park et al. [43], proposed for a distinct purpose. The model developed by Shen et al. [34] was dubbed "ParkPredict". To evaluate the efficacy of their model, the researchers curated a dataset comprising annotated images of parking lots. They employed the CARLA simulator, a widely utilized open-source program in autonomous driving research, to conduct their experiments. In order to assess the performance of ParkPredict, Shen et al. [34] juxtaposed its results against those obtained from an extended Kalman filter (EKF) model and an LSTM model that lacked neural network capabilities. The findings revealed that the model proposed by Shen et al. exhibited markedly superior performance in comparison to the alternative models tested. Despite the accurate results achieved by ParkPredict, the implementation of this model encountered challenges due to its reliance on a simulator. Consequently, the same authors proposed an enhanced model called ParkPredict+ in the work by Shen et al. [44] In this updated approach, Shen et al. [44] opted to utilize a transformer neural network model instead of an LSTM-based model. This decision was based on the consistent demonstration of superior performance by transformer models over LSTM models in similar studies. Unlike the previous model that employed convolutional layers, transformer models employ a self-attention mechanism to process data. Each pixel or data point in an image is assigned a value based on its contextual relevance within the overall image. As the model continues to learn, it discerns the significance of specific pixels, enabling a more comprehensive understanding of network patterns. To gather data for training, a drone was used to capture over 3 hours of footage from a parking lot. The outcomes of ParkPredict+ exhibited significant improvement, surpassing both the EKF model and the original model in terms of performance. The key distinction between these models lies in the usage of LSTM and EKF approaches. Based on the simulations conducted, including other image processing scenarios discussed earlier, it can be concluded that LSTM-based solutions deliver exceptional results in terms of both accuracy and efficiency.
Heinen et al. [45] introduced an alternative architecture, named SEVA3D, for autonomous vehicle control during parking maneuvers. This model was developed in response to the limitations encountered in a prior iteration called SEVA2D, which generated 2D point clouds resulting in certain challenges. SEVA3D primarily relies on SONAR sensors, which emit pulses into the surrounding environment and detect their reflections upon encountering objects. The neural network algorithm employed in this model is known as the Jordan-Net model. The Jordan-Net model is a convolutional neural network in which each output is fed back as inputs, enabling the network to leverage previous outputs to predict future outputs. This approach employs backpropagation techniques to adjust network weights and minimize the loss function. The experimental results indicate a satisfactory level of performance, with an accuracy rate of approximately 90%. However, the developers emphasize the necessity of testing the model under more realistic scenarios, such as incorporating slopes within the parking lot, prior to its practical implementation. In a distinct approach, Wang et al. [46] employ quadrotors to detect parking space occupancy within parking lots. Quadrotors, functioning as drones, navigate the airspace of the parking area and employ YOLO-CNN to identify vacant parking spaces. Quadrotors offer several advantages, including an extensive camera angle that enables them to capture a comprehensive view of the entire parking lot and the ability to maneuver throughout the area. These capabilities surpass those of conventional in-car cameras. However, the transmission of data from the quadrotor to a vehicle poses challenges, and implementing this technology in real-world settings remains complex. Nonetheless, the technique exhibited a high level of accuracy. While empirical data supporting the results is not explicitly provided, the paper includes visual representations in the form of bounding boxes, which indicate the promising performance of the model. Min et al. [47] introduced Graph Neural Networks (GNNs) for the purpose of parking spot identification. While traditional Convolutional Neural Networks (CNNs) have proven to be valuable, their reliance on post-processing power often renders them challenging to implement practically. In contrast, GNNs do not require this type of post-processing power. The Graph Neural Network comprises three essential components: the graph feature encoder, the graph feature aggregation, and the entrance line discriminator. These components collaborate to integrate data and positional information obtained from LiDAR sensors. The network assumes that a parking spot is comprised of four key points, namely the corners, and utilizes this information to extract relevant data pertaining to the spot. The overall performance of the method is highly commendable, achieving a precision rate of 97%. Table 2 summarizes the different techniques used for image processing in autonomous parking while showing their strengths and weaknesses.
The results show that autonomous parking has been approached through various methods involving the utilization of neural networks and different camera configurations. Similar to image processing in the context of autonomous driving, neural networks offer an efficient means for vehicles to analyze real-time road conditions and make prompt decisions regarding their movements. While the tasks of autonomous parking and autonomous driving differ significantly, there are common practices that can be leveraged to accomplish both objectives. Each autonomous parking study used techniques such as sensor integration and neural networks, which are similar to the techniques used in typical autonomous driving.. However, challenges persist with the methods presented. One notable challenge is the high computational expense associated with almost all of these approaches, necessitating substantial processing power to execute the required tasks within mere seconds. Moreover, although the results achieved are generally promising, they must attain a near-perfect level of accuracy to be deemed suitable for practical implementation. This stringent requirement stems from the fact that consumer trust is a paramount asset for autonomous vehicle manufacturers operating within the autonomous parking and driving domain. While notable progress has been made, substantial further advancements are necessary within this field for the aforementioned methods to be adopted in practical real-world scenarios.
4.
Image processing in adverse weather
One prominent obstacle confronting the capacity of autonomous vehicles to perceive objects lies in the obstruction caused by adverse weather conditions such as snow, fog, rain, and others. These conditions impede the sensors' ability to capture information about the surroundings, thereby impeding their functionality. Furthermore, a similar predicament arises when operating vehicles at night, as the vehicle's internal cameras are unable to penetrate darkness, rendering them ineffective in detecting objects on the road. Consequently, the accuracy of various autonomous driving techniques has been significantly compromised, rendering their practical implementation in real-world scenarios nearly impracticable. Subsequent studies elucidate the endeavors undertaken by researchers to address this challenge.
In an effort to mitigate the effects of adverse weather conditions on autonomous parking, Bernuth et al. [48] conducted simulations involving snow within vehicles. The researchers utilized online datasets known as KITTI and Cityscapes, employing the OpenGL platform to input images into a simulator. Within the simulator, snow and fog were added to the images to replicate potential adverse weather scenarios during vehicle operation. To accomplish this, various textures were applied to the images to simulate the presence of snowflakes. Fog was generated using light attenuation algorithms integrated into the simulation, which altered the light intensity within the images. To accurately represent the fog's extent, the researchers calculated the disparity in RGB values between a foggy image and a clear image, incorporating this information into the simulation. The study revealed a significant correlation between severe weather conditions and diminished accuracy, with an average precision decrease of approximately 95%. Nonetheless, the results acknowledge the potential of this technique in addressing the issue at hand in the field. Lei et al. [49] adopt a distinct approach by employing semantic image segmentation for object detection in snowy driving scenarios. Semantic image segmentation involves assigning labels to individual pixels in an image based on their color, texture, and brightness characteristics. This process facilitates a more human-like visualization of the image, enabling easier interpretation by the machine. The authors combine semantic image segmentation with a Convolutional Neural Network (CNN), and they experiment with three different neural network architectures: FCN8S, PSPNet, and ICNet. Although these models share similarities in terms of architecture, they exhibit minor discrepancies in the number and types of layers employed. Among these architectures, PSPNet emerges as the most effective, as it achieves the highest Intersection over Union (IoU) percentage in the conducted studies. The study conducted by Bijelic et al. [50] focuses on enabling autonomous vehicles to operate effectively in the presence of fog. The research involves the creation of a dataset comprising foggy weather conditions, utilizing data collected from a variety of sensors including LiDAR, Radar, NIR (near-infrared), and FIR (far-infrared) sensors. The vehicle's sensor setup includes a pair of RGB cameras at the front, a Radar, a Lidar, an NIR sensor, and an FIR sensor. Data for the experiments was gathered during two test drives spanning Germany, Sweden, and Finland, covering a total distance of 10,000 kilometers and capturing various foggy conditions. The study also incorporates entropy-steered fusion, a technique that merges multiple images into a single composite image. This approach allows the vehicle to perceive the most crucial information from a consolidated image, reducing the computational demands. The principle of entropy involves measuring the level of randomness in an image, which aids in discerning important and unimportant details. Leveraging this deep entropy fusion methodology, the study achieved reasonably accurate outcomes. In comparison to earlier models developed in the literature, which predominantly comprised single-stage object detection methods, the proposed model exhibited significantly enhanced accuracy levels in environmentally challenging conditions. Cai et al. [51] employ an Advanced Driver Assistance Systems (ADAS) approach for nighttime vehicle detection. ADAS refers to a collection of technologies designed to aid autonomous driving, encompassing functionalities like lane departure warnings, collision warnings, and parking assistance. In this study, the authors utilize the previously mentioned FIR infrared camera, which enhances the vehicle's visibility during nighttime conditions. Leveraging these cameras, the system employs visual saliency to extract pertinent information from the surrounding area. Visual saliency entails identifying the distinctive features of an object that facilitate easier detection by the computer system. Subsequently, the system proceeds with vehicle candidate generation (VCG), which involves determining potential vehicles in the vicinity. Following VCG, the system undergoes vehicle candidate verification (VCV), wherein prior calculations and information are employed to validate the existence of the vehicle candidates identified in the previous steps. The system achieves a detection rate of 92.3%, indicating satisfactory results. However, higher and better performance are still needed for real life implementation.
In general, the proposed methods represent advancements in comparison to other models, developed in the literature, utilized for addressing adverse weather conditions, primarily due to their reliance on LiDAR and radar sensors instead of cameras. These sensors are less susceptible to data collection hindrances caused by fog or snow obstruction. However, despite the progress observed, these methods still lack the required level of accuracy for real-world deployment. In the realm of autonomous driving, there is little margin for error. While these works exhibit promise, none of them have reached the stage of practical applicability. To accomplish this increased accuracy, there is a need for more sensors and faster computation speeds. If more sensor data is analyzed from various points on the vehicle, the accuracy of the model will increase. Furthermore, faster computational speeds will allow for more of this data to be analyzed in real-time, preventing the vehicle from any inaccurate detections. It is also worth noting that each of these studies is conducted using datasets that predominantly comprise similar types of images. For instance, if a study is conducted in one region of the United States, the majority of the images in dataset are likely to originate from that specific region. This narrow focus may limit the neural network's accuracy in capturing variations present in different geographic areas within the United States, thereby reducing its generalizability. To combat these issues, more resources must be employed to the capturing of more robust datasets, particularly in currently uncaptured areas. With more places being captured, models will be able to better predict objects in different areas, which will, in turn, increase the accuracy of these models globally. Additionally, the use of simulations in these studies is also less promising for real-world scenarios, as the simulated snowflakes and fog are calculated values determined by the simulator. In contrast, real-life fog and snow exhibit inherent variability and complexity, introducing a significant potential for errors. This is in large part due to the cost of testing these vehicles in real-world scenarios. Given that the models have not been tested yet, it is dangerous to run them in the real-world given the potential hazards associated with the test. Furthermore, vehicles are expensive, and the additional cost of increased sensors and software make it difficult to employ these models in real-world settings as often as they should be. Thus, despite the potential for solutions to various issues, it can be concluded that despite the substantial progress made in addressing this issue within the field, errors persist, rendering many of these technologies impractical for real-world applications.
5.
Decision-Making techniques
Liu et al. [52] provide an explanation of the various types of decision-making techniques and their applications in the context of autonomous driving. Broadly speaking, two distinct categories of decision-making techniques can be identified: classic methods and learning-based methods. Classic methods typically involve the utilization of algorithms such as Finite State Machines (FSMs), which operate based on discrete inputs and outputs contingent upon specific circumstances. These methods rely predominantly on statistical mathematical equations to determine the appropriate decision-making process for the vehicle. In contrast, learning-based methods employ advanced algorithms, including neural networks, to discern the optimal decision outcomes. These techniques leverage the power of machine learning to extract patterns and insights from data, enabling more adaptive and intelligent decision-making capabilities. With the notable advancements achieved in learning-based decision-making technologies, classic methods have become largely impractical due to the statistical advantage offered by learning-based approaches. In the realm of decision-making technologies, the inputs to the function comprise information about the vehicle's conditions and locations, while the outputs entail the strategies and behavior that the vehicle should adopt. The focus of autonomous vehicle's decision-making studies is on developing efficient and accurate algorithms that can analyze the inputs and produce optimal decisions in real-time.
Jimenez et al. [53] present an innovative automated valet parking system (AVPS) that relies on digital maps and sensors for environmental perception. The system's architecture is based on a control architecture, wherein a main algorithm handles route tracking and incorporates subfunctions to address obstacles detection. A key objective of this model is to determine the shortest possible path for the vehicle to reach a designated point, which significantly influences the decision-making process. This methodology is primarily applicable to parking lots, as the researchers assert that GNSS models lack the required accuracy, and SLAM (Simultaneous Localization and Mapping) techniques necessitate more frequent positioning updates, rendering them less suitable for this purpose. Although this paper does not provide quantitative results, the outlined methodology exhibits promise for AVPS implementation. Ferguson et al. [54] present a distinct framework that emerged victorious in the CMA Urban Challenge. The framework comprises four distinct blocks, each responsible for different aspects of the system. These blocks include perception, mission planning, behavioral executive, and motion planning. The perception block utilizes data from various sensors incorporated within the system to construct a comprehensive understanding of the vehicle's surroundings. This process enables the creation of a detailed representation of the environment. The mission planning block determines the desired destination for the vehicle and generates a cost-to-checkpoint value, which represents the most efficient route to reach the designated point. This step aids in optimizing the vehicle's trajectory. The motion planning block leverages the information provided by the mission planning block to generate a trajectory for the vehicle. This trajectory guides the vehicle's movement towards the desired destination. Finally, the behavioral executive block utilizes the data from the previous blocks to create subtasks for the vehicle. These subtasks facilitate the vehicle's navigation and enhance its ability to reach the destination effectively. While tangible results are not provided in the paper, the outlined methodology demonstrates promise. The framework's modular approach, encompassing perception, mission planning, behavioral executive, and motion planning, indicates a comprehensive and systematic approach to autonomous vehicle navigation. Babu et al. [55] propose an alternative model known as Model Predictive Control (MPC), which incorporates optimization models to make decisions and ensure the selection of the best possible solution. The application of optimization techniques in this context represents an innovative approach within the field. Motion planning within this model is investigated through the lens of a velocity obstacle, and the model is constructed with the expectation of potential errors. Although the technique demonstrates some success, the researchers discovered that the model was unable to compute satisfactory results within the required timeframe, thereby rendering it unsuitable for real-life scenarios.
Zhang et al. [56] present an optimization-based approach for collision avoidance in the context of cluttered parking lots. The paper acknowledges the challenges posed by crowded parking environments and the heightened risk of collisions, particularly when autonomous vehicles interact with human-driven vehicles. Recognizing the potential for errors in such scenarios, the researchers propose a path planner that leverages computational algorithms or learning-based decision-making techniques, rather than relying solely on mathematical models. By employing optimization methods, the path planner aims to generate efficient and collision-free trajectories for vehicles navigating through the parking lot. Through simulations, the proposed approach demonstrates promising results, indicating its potential applicability in real-world scenarios. However, further refinement and optimization of timing aspects are necessary to enhance its practical viability. This optimization-based path planner presents an alternative solution to address collision avoidance challenges in cluttered parking lots, paving the way for safer interactions between autonomous vehicles and human-driven vehicles in these complex environments.
Gindullina et al. [57] adopt a distinct approach by employing game theory to navigate autonomous vehicles within a parking garage. This approach utilizes mathematical models to capture the interactions among vehicles, treating each car as a "player" within the game. Over time, the vehicles analyze various possible navigation routes within the parking lot. The objective is to identify the most efficient paths and develop a strategy based on this analysis. The accumulation of data is expected to enhance the vehicle's decision-making capabilities. However, the results obtained from this model were not promising. Although the model managed to identify an effective solution, it did not identify the most optimal solution. Additionally, it is important to note that the vehicle was only trained for the specific parking lot in which the experiment was conducted. Therefore, the findings of this study cannot be generalized to other parking lots without appropriate computational training. While the utilization of game theory presents a unique approach to address autonomous vehicle navigation, the study highlights the need for further improvement and exploration to achieve more accurate and efficient results. Considerations of generalizability and training across different parking environments would be essential for the practical application of this approach in real-world scenarios. Sheng et al. [58] propose a distinctive path planner that combines sampling and optimization techniques. Sampling involves employing a state-lattice planner to convert sensor signals, such as those from LiDAR or cameras, into numerical sequences. Optimization techniques then utilize the sampled data to generate effective path-planning solutions. While these two techniques demonstrate compatibility, their broader implementation is hindered by their complexity and computational demands. In this paper, the authors introduce a novel approach utilizing the hybrid A algorithm, which addresses these challenges. The algorithm identifies viable passages for the vehicle to traverse and subsequently employs optimization techniques to determine executable solutions within the defined region of interest. The results obtained from this approach demonstrate significant promise. However, the authors acknowledge the need to reduce the processing time of the algorithm to enable its practical application in real-world settings. By leveraging the hybrid A algorithm, this research offers an effective path-planning solution by integrating sampling and optimization techniques. Despite the current limitations associated with computational complexity, the study showcases favorable outcomes and highlights the authors' objective of enhancing the algorithm's processing efficiency for real-world implementation. Hongbo et al. [59] propose a model for autonomous parking that employs a formulaic mathematical approach. The methodology utilizes the Ackerman steering model, which ensures that all four wheels of the vehicle follow the same path during turns, thereby simulating the vehicle's movements within the parking lot. The controller algorithm primarily governs the actions of the front wheel, which, in turn, controls the remaining three wheels using the Ackerman model and other algorithms outlined in the paper. The study presents moderately favorable results, demonstrating the effectiveness of the proposed approach. However, it is important to note that the lack of data diversity limits the practical applicability of the model in real-world scenarios. The absence of a diverse dataset that encompasses various parking lot layouts, environmental conditions, and driving scenarios hampers the model's ability to handle the complexity and unpredictability encountered in real-world settings. While the formulaic mathematical approach, coupled with the Ackerman steering model, showcases promise in the context of autonomous parking, further research is required to address the limitations associated with data diversity. Expanding the dataset to encompass a wider range of scenarios would enhance the model's accuracy and reliability, thereby facilitating its practical adoption in real-world applications.
While it is true that the algorithms discussed in the previous papers may have certain limitations compared to more modern approaches, it is important to consider the context in which they were developed, and the advancements made in the field of autonomous driving. These algorithms, based on mathematical formulas and rule-based systems, were developed at a time when machine learning techniques and neural networks were not as prevalent or well-established. The primary advantage of more modern algorithms, such as neural networks, lies in their ability to learn from data and adapt to different scenarios. They can capture complex patterns and make informed decisions based on prior knowledge. This adaptive nature makes them well-suited for handling diverse driving conditions and environments, which can vary significantly between urban and rural settings. However, it is essential to recognize that the earlier algorithms served as important foundations for autonomous driving research and development. They provided initial insights and paved the way for more sophisticated approaches. While they may not match the adaptability and learning capabilities of neural networks, they still hold value in specific contexts and can contribute to the overall understanding of autonomous driving systems. It is important to strike a balance between acknowledging the limitations of older algorithms and appreciating their contributions to the field. As technology advances, the focus has shifted toward more data-driven and learning-based techniques. These newer approaches offer greater potential for real-world application and improved accuracy. It is advisable to explore and leverage the advancements made by neural networks and other modern algorithms to overcome the challenges faced by the outdated models.
6.
Localization
Localization in autonomous vehicles pertains to the process of accurately determining the vehicle's precise position. This aspect holds significant significance in the realm of autonomous driving, as the vehicle's navigation is crucial for dictating its movements. Numerous techniques are employed to achieve effective localization. One such technique involves sensor fusion, wherein data gathered by sensors such as LiDAR, radar, and cameras is integrated to ascertain the exact location. Additionally, other approaches encompass Simultaneous Localization and Mapping (SLAM) techniques, wherein a map of the surrounding environment is constructed to facilitate vehicle localization. Inertial Navigation System (INS) models are also utilized, leveraging various inertial measurement units, including accelerometers and gyroscopes, to aid in determining the autonomous vehicle's position. This section focuses on summarizing and analyzing previous techniques used for vehicle localization. Subsequent to the comprehensive data acquisition process facilitated by an array of sensors, the localization procedure conventionally leverages a point cloud or map representation to facilitate machine comprehension. This representation aims to render the data intelligible to the machine, thereby enabling effective interpretation. The emergent map structure typically embodies a three-dimensional framework encompassing vital information pertaining to vehicle positions, proximate objects, and notable landmarks, which serve as pivotal reference points for algorithms tasked with the vehicular trajectory tracking over temporal epochs. The dynamic movement of an autonomous vehicle prompts incessant data accrual via the LiDAR sensors encircling it, engendering the iterative construction of point clouds that in turn facilitate algorithmic real-time comprehension of the prevailing vehicular state. In many instances, these LiDAR sensors are harmoniously amalgamated with Global Positioning System (GPS) data and Inertial Measurement Units (IMUs), comprising gyroscopes and accelerometers. This concerted integration serves to mitigate the inherent limitations of LiDAR sensors. By virtue of their orientation towards object motion detection, this composite ensemble more adeptly captures vehicular trajectory, thereby furnishing a more robust real-time path representation. Subsequent to data fusion, the algorithm endeavors to establish correspondences between the LiDAR-derived point cloud and the pre-established map, discerning congruent points that function as reference beacons for pose estimation. The crux of the procedure revolves around pose estimation, wherein the algorithm computes the vehicle's spatial position and orientation. Conventionally characterized by variables such as positional coordinates (x, y, z) and angular orientations (roll, pitch, yaw), this estimation relies on the assimilated sensor data. Extended Kalman Filters frequently underpin this estimation process, delivering a recursive estimation methodology. The entirety of this process unfolds continuously during vehicular movement. Certain models incorporate advanced techniques like loop closure detection and odometry correction, ameliorating the deleterious influence of factors like sensor-induced noise, measurement inaccuracies, and assorted anomalies [60].
In their study, Kato et al. [61] introduce Autoware, an open-source software architecture designed for autonomous driving applications. This architecture employs LiDAR sensors and cameras, along with additional models for steering and vehicle behavior control. The implementation of Autoware is based on ROS (Robot Operating System), a software framework that offers a collection of tools for diverse robotic tasks. The authors also utilize the point cloud library (PCL), which encompasses a range of algorithms for localization, aiding in LiDAR scan mapping and data representation in a three-dimensional space. To evaluate its performance, the Autoware system has undergone extensive testing in various countries, encompassing both long-distance and urban driving scenarios. The study's findings indicate that the system demonstrated satisfactory functionality. Nonetheless, further testing and refinement are necessary to enhance its overall performance. Qingqing et al. [62] also employ SLAM techniques, as discussed earlier, to achieve localization. This entails utilizing data from GPS, LiDAR, ultrasonic, and radar sensors, and employing sensor fusion techniques to generate detailed 3D point clouds. In their algorithm, the vehicle's motion is modeled by integrating data from inertial measurement units, including accelerometers, gyroscopes, and compasses. The research work further incorporates the Normal Distributions Transform (NDT) algorithm to perform the matching of the 3D point cloud. NDT involves fitting Gaussian distributions to data points, approximating the underlying data and examining data relationships. While the study yielded mostly accurate results, it is important to note that the tests were conducted without the presence of road markers. The existence of road markers in real-world scenarios introduces a potential source of error, rendering the proposed approach unsuitable for deployment in practical situations. Additionally, the model's applicability in urban environments is limited due to its heavy reliance on accurate maps. As the NDT algorithm relies on approximating certain map aspects, an imprecise or inaccurate map could have severe consequences and undermine the system's performance. Realpe et al. [63] present a fault-tolerance method aimed at analyzing and detecting algorithm faults in autonomous vehicles. Given the relative novelty of this technology and its limited exposure to real-world traffic scenarios, the study recognizes the early stage of development in this field. In this context, the paper defines a "fault" as a defect in either the hardware or software components, while an "error" is regarded as an instance where a fault manifests in a real-world environment. The researchers further define a "failure" as the occurrence of an undetected error, leading to inaccurate program execution. To mitigate the impact of sensor faults, the paper proposes a model based on a federated fusion structure, enabling researchers to monitor changes and discrepancies and provide feedback. The model incorporates a Kalman filter to predict the future location of objects. Overall, the proposed model proves effective, as it successfully reduces the percentage of false positives by an average of 7% when appropriate modifications are made to the weights. Isukapati et al. [64] propose a sensor fusion model aimed at enhancing navigation safety by reducing false negatives and improving their identification. The approach involves encoding each surrounding map using dedicated short-range communication, a vehicle-to-vehicle communication method. This enables the exchange of information among vehicles on the road, granting each vehicle access to the encoded data. The localization techniques employed in this study closely resemble SLAM techniques. Additionally, the Euclidean method, a geometric transformation that calculates distances between points, is utilized to account for orientation changes in the images. Furthermore, the cumulative density function, which describes probability differences, is employed to identify variations in functions. The paper concludes that the methodology effectively minimizes false negatives. However, it acknowledges the need for a more sophisticated infrastructure to enable the broader implementation of this technology. Nabati et al. [65] employ sensor fusion of radar and camera systems for object detection and distance estimation between vehicles. Radar sensors are utilized for their proficiency in determining distance and velocity, while cameras excel in object identification. Data collected from these sensors is processed and integrated into a point cloud, which serves as input for object detection. It Is important to note that radar sensors only generate 2D point clouds, unlike LiDAR sensors that typically produce 3D point clouds. In the proposed approach, each radar point is treated as an independent detection, allowing for the generation of 3D object formations without extensive feature extraction. The 2D proposals generated by radar are then subjected to the Radar Proposal Refinement algorithm, which fuses radar and camera data for object detection. To perform the object detection task, a region proposal network (RPN) is utilized. The study compares the performance of the proposed model against an RRPN network and a faster R-CNN network. While the results show increased precision compared to the alternative networks, they are not sufficient for the proposed approach to be employed in real-world applications at present.
Farag et al. [66] employ the Kalman filter for sensor fusion in both localization and object detection. The study utilizes a car equipped with LiDAR data, which is loaded into a neural network for operation. The LR_ODT method is employed, which involves clustering radar and LiDAR data to detect objects using the Grid-Based Density-Based Spatial Clustering of Applications with Noise (GB-DBSCAN) algorithm. GB-DBSCAN is a density-based clustering algorithm that groups data points with similar values or densities. To handle missing or unavailable data points necessary for object detection, the Kalman Filter is utilized. The Kalman Filter serves as a predictive algorithm capable of estimating such data points. Two different types of Kalman Filters are employed in this study: the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF). EKFs are suited for dealing with nonlinear relationships and employ Taylor series approximations while linearizing measurement models. UKFs, on the other hand, utilize "sigma point" to estimate the mean and covariance of the system, generating approximations based on these sigma points. In this case, the LR_ODT model combines elements of both filters, utilizing a UKF for data fusion and an EKF for approximations. The results of the study show promise, as the system demonstrates real-time capability, with satisfactory performance. Liu et al. [67] present a technique for localization and navigation in autonomous vehicles, employing an integrated approach that combines the Integrated Inertial Navigation System (INS), GPS, and adaptive Kalman filtering to mitigate environmental noise. The Inertial Navigation System serves as an autonomous navigation method, utilizing inertial measurement units (e.g., accelerometers and gyroscopes) to accurately determine the vehicle's location while in motion. This is achieved by calculating the distance traveled within a specific time frame. GPS, on the other hand, relies on signals to provide location information. While using these methods in combination yields satisfactory results, there is a potential for error and noise caused by obstacles such as buildings and trees. To address this, an adaptive Kalman filter is employed to mitigate the impact of such obstacles. The adaptive Kalman filter is designed to minimize noise and enhance accuracy in the localization and navigation process. The proposed method demonstrates favorable performance, although some fluctuations in results were observed in the adaptive Kalman filter component of the system. Ouyang et al. [68] present a sensor fusion-based "target detector" aimed at addressing depth perception challenges in autonomous vehicles. The paper highlights that one of the major obstacles in image processing for autonomous vehicles is the machine's limited ability to perceive depth compared to humans. To tackle this issue, the authors propose a model called SaccadeFork. The SaccadeFork model utilizes a combination of cameras and LiDAR sensors to generate a point cloud representation. To enhance the quality of the point cloud, the model incorporates a bilateral filter, which effectively reduces noise and texture in the images, thereby improving readability for the machine. Additionally, Delaunay trigonometric interpolation is employed to further densify the point cloud and mitigate noise and errors in the data. Subsequently, a convolutional neural network (CNN) is deployed to perform vehicle localization. The results of the study show promising outcomes; however, the model exhibits some false detections of pedestrians and cyclists, along with challenges in night-time vision. These limitations indicate areas that require further improvement in the proposed model. Aldibaja et al. [69] propose a novel localization method called lateral road-mark reconstruction [70]. The paper distinguishes between two types of localization approaches: holistic-based and feature-based. Holistic-based localization relies on LiDAR sensors and 3D point clouds, while feature-based localization typically employs SLAM techniques. Previous research has focused on detecting curbs and lanes; however, since these features are not always present, the machine often detects lane lines instead, leading to various challenges. The proposed method aims to determine the amplitude of the peaks of curbs by assessing the contrast and continuity of the curbs. To achieve this, the method leverages LiDAR sensing, GNSS, LOAM, and NDT++ methods, as discussed previously. The overall performance of the methods is deemed effective, with NDT++ demonstrating the highest efficacy due to its "double sensor fusion" capability. GNSS follows in effectiveness, with LOAM ranking third. In summary, the proposed lateral road-mark reconstruction method shows promise in localization, with NDT++ exhibiting the most favorable results, followed by GNSS and LOAM. Table 3 summarizes the different techniques used for autonomous vehicle localization while showing their strengths and weaknesses.
Thus, the results show that numerous localization methods have demonstrated satisfactory performance in various scenarios. LiDAR and radar sensors are commonly utilized in these methods due to their ability to accurately detect road movements even in the presence of obstacles. Point cloud representations, which graphically depict the 3D positions of LiDAR points in relation to objects, play a crucial role in visualizing vehicles and their surroundings. Clustering techniques applied to the point cloud enable the identification of objects within specific regions. Additionally, Kalman filters are employed to estimate the locations of undetected objects based on sensor data. However, it should be noted that while these models exhibit relatively good success rates, they often lack generalizability to real-world environments. Many models are developed and tested within specific regions, resulting in limited dataset diversity and training. Furthermore, significant inaccuracies still persist in these models, which is a critical concern given that even slight inaccuracies can have severe consequences such as car crashes. Moreover, transferring localization information to the vehicle's autonomous driving systems in a timely manner poses challenges due to the high computational power requirements, which not all models can meet. Consequently, the usability of these models in real-life applications is restricted. In summary, while localization methods have shown promise, limitations persist in terms of their applicability to diverse real-world scenarios, the presence of inaccuracies, and the computational demands associated with information transfer to autonomous driving systems. Addressing these challenges is crucial to enhance the effectiveness and safety of localization techniques in practical settings.
7.
Vehicle tracking and trajectory prediction.
In spite of the notable advancements in the domain of autonomous vehicle localization, the capacity to effectively track and predict the trajectories of dynamic objects remains a subject demanding substantial research attention. Dynamic objects, encompassing moving entities in the vicinity of a vehicle, notably including other vehicles, pose a distinct challenge as they cannot be merely detected but necessitate predictive assessment. Given the imperative of vehicular safety and the aspiration for real-world deployment of autonomous vehicles, their capability to not solely detect but also circumvent these dynamic objects is of paramount import. Across various endeavors, the utilization of Model Predictive Control (MPC) has emerged as a prevalent strategy for trajectory tracking. MPC encompasses the determination of control actions that minimize a cost function for a constrained dynamic system within a bounded advancing time horizon, striving for optimal operational efficacy. Sequentially, at each discrete time instance, MPC evaluates the vehicle's "state" and subsequently computes successive actions guiding the vehicle's evolution from the current juncture to the subsequent. The particular choice of MPC formulation is contingent upon the distinct nature of the cost function, adaptable to divergent situational contexts.
Park et al. [265] employ a Linear Quadratic Regulator (LQR) to facilitate experimental drift control in an autonomous vehicle context. The LQR is orchestrated to minimize a cost function, here manifesting as a quadratic expression, and consequently assumes control over the system dynamics. Over temporal iterations, the MPC progressively steers, minimizes, and optimizes the control mechanism, bestowing operational viability to the model. This intervention targets the vehicular drift maneuver, a significant component in autonomous vehicle safety, with the path tracking controller manipulating slip angles to follow the drift trajectory. The model's efficacy is evaluated through MATLAB-based simulations, wherein a feedforward input, established on pre-defined curvature and velocity paths, computes the steering angle via the path tracking controller. Concurrently, a feedback element leverages a Lyapunov analysis to minimize lookahead error, assuring model stability over time. While the model's performance exhibits a diminishing error trend over temporal epochs, its pragmatic viability remains circumscribed due to a multitude of unaccounted variables. Pang et al. [266] introduce a time varying MPC approach for autonomous vehicle trajectory tracking. This methodology aims to ameliorate computational intricacies inherent to conventional MPC methods, concurrently enhancing accuracy and performance. By virtue of its capacity to discretize both linear and nonlinear control systems, the time varying MPC expedites computations, bolstering the model's operational speed. This approach pivots around the anticipation and response to temporal evolution, thereby conferring predictive acumen in a context marked by fluctuating conditions. This prescient ability to forecast contingencies, particularly salient in the unpredictable context of vehicular operation, underscores its pertinence. The research endeavors include the formulation of a kinematic model that encapsulates diverse vehicular complexities such as acceleration, drifting, and turning. Empirical validation involves the employment of dual controllers, whose performance is quantified through three critical performance indicators: positional coordinates (x, y), and angular orientation. Notably, this novel methodology outperforms traditional MPC techniques, manifesting superior performance across a range of tests. Borrelli et al. [267] present a commensurate approach to autonomous vehicle tracking through the adoption of an Offset-Free MPC technique. Designed to address longitudinal and lateral coupling constraints arising from tire friction, this approach, akin to time varying MPC, endows the model with adaptability to shifting environmental conditions. A tripartite composition, encompassing the solver, reference generator, and Kalman filter, facilitates computation of the steering angle and acceleration demand. This intricate amalgamation enables the derivation of steering wheel angle, driving torque, and braking pressure command. Evidently, the model evinces reduced error vis-à-vis compared counterparts, albeit accompanied by marginally extended computational time. Cheng et al. [268] posit a Linear Matrix Inequality (LMI)-based MPC architecture. LMI-based MPC strategies employ intricate mathematical frameworks to formulate optimal control actions. Notably, these methods lack the capacity to robustly detect and adapt to environmental changes. The proposed approach aligns with the preceding MPC models in foundational assumptions, and similarly incorporates the Lyapunov function. While evincing commendable efficacy, this model remains susceptible to challenges stemming from slippery or adversarial road conditions, impeding its real-world applicability. The exigency of addressing these issues becomes pivotal for the practical deployment of this model. In a departure from the conventional MPC paradigm, Williams et al. [269] introduce an Information-Theoretic MPC (IT-MPC) model. IT-MPC involves adapting a "control strategy" to anticipate forthcoming events in the vehicle's environment, incorporating predictive prowess augmented by information theory. This model uniquely optimizes the collection and utilization of information, an attribute resonant with the non-linear nature of many autonomous vehicle systems. Owing to this aptitude, IT-MPC is well-suited for collision avoidance and maneuvering within complex contexts. Validation through dirt-road track simulation demonstrates promising results, underscoring the model's potential effectiveness.
Petrovskaya et al. [270] introduce a model tailored to object tracking within urban settings. The test vehicle in this study was equipped with an Applanix navigation system, with the vehicle's geometry approximated as a rectangle to minimize the variability inherent to object detection. Predominantly reliant on laser range finder sensors, exemplified by the IBEO Alasca, the researchers encountered a central challenge in the utilization of raw data. Due to varying vantage points for each sensor reading, substantial sensor noise emerged as a key concern necessitating mitigation. Experimental validation encompassed three distinct scenarios, yielding satisfactory accuracy levels spanning from 97% to 99% true positive detections. Likewise, Galceran et al. [271] also develop a tracking module targeting occluded dynamic objects surrounding vehicles. Conventional frameworks grapple with the challenge of occluded object detection, which engenders inherent hazards for drivers. The study's approach integrates simulators to emulate occluded objects and employs a neural network alongside an innovative algorithmic paradigm, notably the hybrid Gaussian mixture model (hGMM), to detect and analyze these obscured entities. Once an object is tracked, the model assesses the extent of occlusion and establishes associations with other objects, marking a novel contribution. Promising outcomes underline the efficacy of this endeavor. Wang et al. [272] pivot towards pedestrian tracking in the vicinity of autonomous vehicles, a crucial aspect engendering vehicular safety. This pursuit capitalizes on 3D LiDAR sensors, pivotal in ascertaining depth and range information essential for avoiding pedestrian-related incidents. The model leverages these sensors to construct a point cloud representing the surroundings, subsequently deploying a support vector machine (SVM) for pedestrian recognition and classification. This stands in contrast to prevalent pedestrian recognition methodologies often centered on cameras, albeit the latter's deficiency in furnishing comprehensive depth and range data. This void is adeptly addressed by the intrinsic attributes of LiDAR sensors. The fusion of GPS, IMU, and Distance Measurement Unit (DMI) augments real-time vehicular localization. SVM, in conjunction with the radial basis function (RBF) kernel, is harnessed for pedestrian identification. Empirical validation yields a true positive rate exceeding 99%, albeit accompanied by a false positive rate of approximately 92%, thereby demonstrating promising performance. However, the researchers acknowledge the need for expanded work and datasets to actualize real-world deployment. Moreover, the extensible framework presents utility beyond pedestrian identification, potentially encompassing the classification of bicycles and other vehicles. Table 4 summarizes the different techniques used for vehicle tracking and trajectory prediction while showing their strengths and weaknesses.
In conclusion, the domain of autonomous vehicle tracking has witnessed the creation of diverse models, frequently centered around the paradigm of model predictive control (MPC) and its variants. Despite the encouraging outcomes exhibited by these works, persistent challenges endure. Notably, the dearth of comprehensive and robust datasets coupled with a limited scope of testing scenarios restricts the broader operational viability of these models. The absence of diverse real-world evaluations hampers the capacity to extrapolate findings to a wider array of settings and operational conditions. A recurrent impediment involves the computational complexity inherent to many tracking models. The resource-intensive nature of these models mandates the utilization of high-performance processors, resulting in substantial costs. Given the predominant commercial orientation of real-world autonomous vehicles, manufacturers exhibit hesitancy in adopting these models at a broad scale due to the considerable financial burden associated with procuring such sophisticated processors. For these autonomous vehicle tracking models to effectively scale, the critical factors of comprehensive testing, diverse operational scenarios, and cost-efficient processing solutions must be concurrently addressed. Extended and rigorous validation under varying conditions will engender confidence in the model's reliability. Additionally, the development and integration of more cost-effective processors will facilitate their widespread adoption, aligning with the commercial realities of the autonomous vehicle industry. Collectively, these advancements are essential to bridge the gap between promising theoretical constructs and their pragmatic real-world application.
8.
Potential cause for concern – Security of vision technology
Despite the notable strides made in the realm of autonomous vehicle image processing, a significant concern looms over the security implications stemming from the constant imaging and sensory engagement of these vehicles. The proliferation of cyber-attacks and hacking incidents has emerged as a disconcertingly recurrent phenomenon within this domain. Such attacks possess the potential to yield catastrophic outcomes. For instance, malevolent actors could manipulate the neural network that underpins the vehicle's functioning, precipitating a substantial reduction in accuracy and thereby engendering perilous road conditions. A graver possibility involves hackers attaining remote control over the vehicle, accentuating the potential hazards to unprecedented levels. Furthermore, the data garnered through the sensory apparatus of autonomous vehicles can encompass sensitive information. The vulnerability to hacker intrusion introduces a considerable dimension of privacy concern, exacerbating the already intricate security landscape. The ensuing section delineates studies that dissect the latent vulnerabilities inherent to autonomous vehicles, evaluate the risk of prospective cyber-attacks, and proffer viable solutions and recommendations to counter this pressing issue.
Cui et al. [276] undertake a comprehensive scrutiny of the security vulnerabilities that pervade the landscape of autonomous vehicles, offering a cogent examination of the consequential implications stemming from the interconnectedness inherent to these vehicular systems. This interconnectivity, primarily facilitated through Vehicular Ad Hoc Networks (VANETs), brings forth a myriad of challenges that extend beyond the confines of an individual vehicle, potentially cascading across an entire network of connected vehicles. As autonomous vehicles often necessitate network connectivity for purposes such as crowdsourcing and network-enabled neural networks, the fragility of this connectivity acquires heightened salience, given that security breaches can unleash ripple effects that compromise the integrity and operation of multiple vehicles. VANETs, constituting an integral facet of the broader Intelligent Transport System (ITS), fundamentally orchestrate the seamless exchange of information amongst vehicles, striving to optimize the dissemination of vital data among a network of vehicles. This networked information flow serves as a foundation for various functions, including real-time navigation updates, traffic management, and enhanced road safety. However, the effectiveness of these functionalities rests upon the assurance of secure and trustworthy communication channels, a principle that is progressively threatened in an environment vulnerable to cyber intrusions. An overarching issue that emerges within the purview of autonomous vehicle security is the imperative of safeguarding authenticity, availability, data integrity, and confidentiality. Authenticity pertains to verifying the legitimate source of information or commands, whereas availability underscores the uninterrupted accessibility of vehicular functionalities. Data integrity underscores the unadulterated nature of transmitted data, while confidentiality pertains to preserving the privacy of sensitive information. These components are inextricably linked to the overarching notion of vehicular safety, encapsulating the foundational tenets that uphold the sound operation of autonomous vehicles in an interconnected landscape. A fundamental concern emanates from the potential exploitation of VANETs by malicious actors, where malevolent interventions can compromise the sanctity of vehicular communication. The authors delineate instances wherein attackers manipulate source identities, consequently deceiving network nodes into treating spurious messages as authentic. Another vector involves the falsification of entities, leading the network to erroneously endorse the legitimacy of certain parties, thus enabling unauthorized access. The multifaceted array of security attacks encompasses various permutations, warranting an array of countermeasures. Mitigating these security risks necessitates the fusion of software-based detection mechanisms, anti-spoofing methods, and robust anti-virus firewalls. These defenses are imperative to fortify the system's resilience against adversarial actions that seek to exploit inherent vulnerabilities. Nevertheless, a lamentable state of affairs prevails in the present landscape, characterized by a scarcity of robust countermeasures capable of effectively curtailing these mounting threats. This paucity of effective remedies underscores the need for comprehensive research initiatives aimed at devising innovative strategies to fortify the security posture of autonomous vehicles. Ferdowski et al. [277] proffer a novel approach, manifesting in a deep reinforcement learning model meticulously designed to detect potential threats to the security framework. Central to their approach is the conceptualization of the security challenge as a noncooperative game that unfolds between the assailants and the autonomous vehicle. This game-theoretic perspective offers a sophisticated lens through which the dynamics of adversarial interactions can be dissected and subsequently addressed. The analytical bedrock of this approach rests upon the Nash equilibrium, a well-established principle within game theory, which furnishes a standardized framework for determining equilibrium solutions in noncooperative games. Within this paradigm, the researchers propose a dual-tier framework hinging on Long Short-Term Memory (LSTM) models, a class of recurrent neural networks renowned for their efficacy in modeling sequential data. These models serve as vehicles for extracting and assimilating pertinent features and dependencies from the autonomous vehicle's data streams. The assimilated knowledge is subsequently leveraged to counteract the impact of data injection attacks directed at sensor data integrity. The authors demonstrate the applicability of their approach through empirical evaluation, presenting evidence of the model's efficacy in attenuating the adverse consequences of adversarial interventions. Shifting focus to sensor security, Xu et al. [278] delve into the intricate domain of ultrasonic sensors, highlighting both their salient role as primary detectors of obstacles and the intricate ethical conundrums they entail. The study navigates through the complex landscape of sensor security, encompassing multifaceted attacks spanning physical signal level attacks, sensor hardware level attacks, and digital level attacks. These attacks manifest varying levels of sophistication and potency, targeting different layers of the vehicular sensory apparatus. Additionally, the authors introduce two additional categories: spoofing attacks and jamming attacks. Spoofing attacks entail the transmission of seemingly legitimate yet malicious signals, while jamming attacks involve overpowering authentic signals with harmful ones. Both forms of attack punctuate the dire consequences that arise from a compromised sensor infrastructure. In addressing these multifarious threats, the paper underscores the critical importance of attacker localization, which is pivotal in minimizing the impact of the attack. Localization is a formidable challenge, as it necessitates the precise identification of the point of origin for adversarial interventions, often exacerbated by the inherently dynamic nature of vehicular environments. Furthermore, the authors advocate for the adoption of Physical Shift Authentication (PSA) as a potential panacea for authenticating signals. This method hinges upon the manipulation of wave parameters to confer authenticity to signals, rendering them resistant to manipulation or intrusion. Additionally, the study introduces an algorithmic framework that harnesses the Electronic Control Unit (ECU) as a cornerstone for attack localization. The crux of this scheme is predicated on harnessing the repetitive information streams generated by diverse sensors to detect and rectify anomalies induced by adversarial interventions. In summation, Cui et al. [276] provide a comprehensive dissection of security vulnerabilities in the context of autonomous vehicles, underscoring the amplified impact of interconnectedness and advocating for a multifaceted approach to enhance security posture. Ferdowski et al. [277] introduce a pioneering methodology leveraging deep reinforcement learning to detect threats, while Xu et al. [278] probe the intricacies of ultrasonic sensor security, elucidating potential threats and advancing viable solutions. Collectively, these studies illuminate the multifarious dimensions of security in autonomous vehicles and underscore the pressing need for comprehensive research efforts to fortify their operational integrity.
The question of security in autonomous vehicles has a substantial magnitude, particularly in light of the proliferation of these vehicles in the commercial domain. This challenge predominantly emanates from the deficiency of robust protective mechanisms inherent to most autonomous vehicles. As elucidated in a corpus of scholarly literature, a potential avenue for resolution lies in the augmentation of sensor arrays integrated within these vehicles, thereby engendering heightened redundancy in the available information. The introduction of redundancy would ostensibly obfuscate the saliency of malicious signals, consequently mitigating their deleterious influence. Concurrently, the implementation of a centralized control system emerges as an imperative requisite, as the enhancement of security protocols in these vehicular paradigms warrants unequivocal prioritization.
9.
Conclusions
This study provides an extensive overview of the challenges and opportunities in image processing and sensor fusion for autonomous vehicles. The findings highlight the remarkable progress that has been made in various subfields, including object detection, recognition, tracking, scene understanding, localization, autonomous parking, and addressing adverse weather conditions. However, several limitations and open research areas have been identified, which need to be addressed to facilitate the practical implementation of these technologies in real-world scenarios. One key challenge identified in the reviewed studies is the requirement for robust and comprehensive datasets. The limited computing power of autonomous vehicles often necessitates reliance on remote databases, which can impact the accuracy of models. To overcome this challenge, improved accessibility to computing resources is needed to create adequately sized datasets. Moreover, the lack of diversity in training data poses a significant limitation. While promising results are achieved within specific settings, models may lack generalizability across different locations and environmental conditions. Therefore, efforts should be directed towards developing diverse and robust datasets to enhance the performance and applicability of image processing and sensor fusion techniques. Over time, these datasets are poised to attain a heightened prevalence within the domain, subsequently experiencing augmented utilization across these applications. Consequently, a multitude of models employed in the realm of autonomous driving will encompass a more intricate spectrum of scenarios within which vehicular operations transpire. This expanded array of scenarios is anticipated to conduce to a facilitation of the autonomous driving task, thereby imbuing it with a heightened degree of operational ease. Another critical aspect highlighted in the reviewed studies is the trade-off between computational complexity and real-time responsiveness. Autonomous vehicles require sophisticated algorithms and extensive computational power for analyzing vast amounts of sensor data and making complex decisions. However, the need for real-time responsiveness in dynamic driving situations necessitates minimizing computational time. Striking the right balance between processing capabilities and real-time responsiveness is essential to ensure timely decision-making and adaptability to dynamic changes in the environment. In addition, the results also emphasize the challenges and advancements in autonomous parking. While notable progress has been made in utilizing neural networks and different camera configurations for autonomous parking, high computational expenses and the need for near-perfect accuracy pose significant challenges for practical implementation. Further advancements are necessary to ensure the reliability and trustworthiness of autonomous parking systems given the inconsistency in the current parking lots setups such as the marking and signage that vary from one parking lot to the other. The evolution of these innovations is slated to transpire progressively, as semiconductor enterprises like NVIDIA and AMD presently engage in the developmental phase of diverse microchips tailored to aptly cater to the requisites of autonomous vehicle developers. As the aggregate availability of these microchips undergoes expansion across temporal horizons, their accessibility to manufacturers will concomitantly augment, thereby facilitating their integration into an amplified spectrum of real-world contexts.
Furthermore, addressing adverse weather conditions is another important aspect discussed in the paper. While the incorporation of LiDAR and radar sensors has proven effective in overcoming challenges posed by adverse weather, the current methods still lack the required level of accuracy for real-world deployment. Errors persist due to limitations in dataset diversity, inaccuracies in models, and the inherent variability and complexity of real-life weather conditions. Thus, future research should focus on improving the robustness and accuracy of algorithms to make them more suitable for real-world applications. Lastly, the review acknowledges the contributions of earlier algorithms based on mathematical formulas and rule-based systems. While these algorithms may have limitations compared to more modern approaches, they served as important foundations for the development of autonomous driving systems. However, as technology advances, the focus has shifted towards more data-driven and learning-based techniques, such as neural networks, which offer greater adaptability and accuracy. However, striking a balance between acknowledging the limitations of older algorithms and leveraging their contributions in specific contexts is crucial for the overall advancement of the field.
Despite the noteworthy strides in autonomous vehicle image processing and localization, it is evident that substantial strides remain imperative to facilitate the broader deployment of this technology. The prevailing challenges in the realm of autonomous vehicles are frequently rooted in resource insufficiency rather than technological deficiencies. Although the efficacy of LiDAR sensors in facilitating comprehensive environmental perception for autonomous vehicles has been substantiated, the prevalent commercial preference for radar sensors persists due to their cost-effectiveness. Consequently, numerous existing models and localization methodologies manifest a degree of accuracy that falls short of its potential, thereby constraining the attainment of Level 5 autonomous driving capabilities. The trajectory toward overcoming this limitation is inextricably linked to the augmentation of advancements in the domain of LiDAR sensor manufacturing, a development that is poised to precipitate a reduction in the cost associated with this technology. Such a cost reduction is anticipated to engender heightened accessibility for manufacturers of commercial autonomous vehicles, subsequently fostering an environment conducive to the integration of more accurate and sophisticated models, thereby propelling the journey toward Level 5 autonomy. This challenge is further exacerbated by the conundrum of computational complexity. The computational demands of many prevailing models necessitate the utilization of potent microprocessors, a resource that is often characterized by a high economic outlay. The prevailing global dearth of microchips only amplifies this predicament, rendering the acquisition of these chips financially onerous for vehicle manufacturers. This, in turn, instills a hesitance to construct models reliant on these resource-intensive processors. Consequently, the efficacy of numerous models is hindered, falling short of their maximal potential. In summation, while advancements in autonomous vehicle technology have been notable, a persistent requirement for enhanced resources and cost-effective technologies underscores the need for concerted efforts to surmount these limitations. This multifaceted challenge demands not only the refinement of sensing technologies but also the mitigation of resource-related impediments, ultimately culminating in a transformative shift toward more potent and accurate autonomous driving paradigms.
Moreover, the paucity of diverse and comprehensive datasets constitutes a significant impediment to the broader efficacy of autonomous vehicles beyond specific contextual confines. An analysis of the literature elucidates a prevailing tendency towards datasets predominantly characterized by uniform urban landscapes, with a substantial subset emanating from a singular data set. While the utilization of analogous datasets may be advantageous for in-depth model assessment, its efficacy wanes concerning the cultivation of a versatile operational domain. Models engendered under such circumstances are predisposed to ineffectiveness when exposed to scenarios divergent from those encapsulated within the training dataset. This deficiency predominantly emanates from the scarcity of datasets representative of such disparate geographic and environmental contexts. In light of these considerations, a prudent recommendation emerges: the allocation of augmented resources towards the systematic compilation of datasets encompassing diverse geographic and contextual terrains. Such an initiative is anticipated to furnish autonomous vehicles with an elevated capacity for seamless functionality across a broader spectrum of scenarios. This diversification in datasets catalyzes an evolution toward more universally adept autonomous vehicular systems. Lastly, the paramount concern pertaining to security persists as a salient preoccupation for the stakeholders within the autonomous vehicle domain. Presently, the nascence of extensive commercialization within this sector tempers the immediacy of the hacker threat. However, a forward-looking perspective underscores the inescapable exigency for robust cybersecurity measures within autonomous vehicles. A failure to implement these measures could conceivably usher in perilous consequences. Consequently, an imperious imperative emerges, necessitating the integration of formidable security protocols within the autonomous vehicular framework to forestall potential breaches, ensuring the safeguarding of both occupants and bystanders alike.
In summary, this literature review provides valuable insights into the current state and future directions of image processing and sensor fusion in autonomous vehicles. The identified challenges and open research areas pave the way for further advancements, such as the development of diverse and robust datasets, striking a balance between computational complexity and real-time responsiveness, improving accuracy in adverse weather conditions, and enhancing the usability and reliability of localization methods. By addressing these challenges, researchers and practitioners can advance the development of reliable and efficient autonomous driving systems, ultimately contributing to the realization of safe and widespread autonomous transportation.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Conflict of interest
The authors have no conflicts of interest to disclose.