Keywords

1 Introduction

Nowadays, there are still many road accidents every year, with annual road traffic deaths reaching 1.35 million in 2018 [2, 4]. The development of autonomous vehicles aims to improve traffic security and driver comfort by reducing car accidents, easing the flow of traffic, reducing pollution and assisting the driver in the various driving tasks. However, we can expect to see a transition from cars with little or no automation (level 1 or 2 [36]) to conditionally (or highly) automated cars (level 3 to 4 [36]) on the road before we see fully automated cars (level 5 [36]). In conditionally automated cars, the automated system and the driver share the control of the car, with only one of them in charge of the driving task, depending on the situation. Specifically, the car would keep control until it encounters a situation it cannot handle, or if the driver wants to take the control back. When the automated system detects a situation it cannot handle, it will trigger a TOR to let the driver know that a transition of control is necessary, often within seconds. This take-over is a critical action, which could lead to accidents if it is not correctly and timely communicated and executed, especially in situations when the driver is out of the loop [9].

Since many parameters influence the take-over quality and rapidity, AI and Machine Learning are very valuable assets for exploiting the richness of the available data. In fact, an AI-based interface could analyze in real time the available contextual information and trigger the best TOR. To do so, the AI Model can choose among a set of available modalities to convey the TOR with regard to each situation (contextual TOR).

Modality selection should be done carefully; the choice of modalities was shown to have an impact on both the quality and the rapidity of the take-over [7, 16]. For instance, some modalities can improve the rapidity of the take-over at the cost of deteriorating its quality [17]. The current state of the art revealed no “perfect” set of modalities, implying that the best set of modalities of a TOR depends on the situation. Nevertheless, several researchers have shown that multimodal TOR are often more effective than unimodal ones [30, 33]. Once again, Machine Learning is used to select the best set of modalities in order to optimize TOR rapidity and quality.

Currently available semi-autonomous systems only provide a unique TOR independently of the reason and context that triggered it. Previous studies have advanced the possibility of a take-over assistant [41], but the TOR design is limited to user preferences and not the root cause of the disengagement. In contrast, the proposed AI-Companion would monitor different contextual factors highlighted by the literature to adapt the TOR modalities, in order to optimize both the take-over quality and rapidity. These factors are the psychophysiological state of the driver [22] and the environment, both inside [2] and outside [18] the car.

The psychophysiological state of the driver is a key input for our approach. Such a state is encoded into a prior Machine Learning model. The goal of this model is to allow adapting and individualizing in real time the driver-car interaction to the driver’s current state and fitness to drive. To do so, several physiological signals such as electrocardiogram, electrodermal activity and respiration are investigated. In particular, the model classifies the driver’s condition regarding four driver states: alertness, attention, affective state and situational awareness. The outcomes of this model are then combined to create a global indicator of the driver’s psychophysiological state. The AI-Companion uses such indicator for adapting its interaction with the driver.

The proposed AI-Companion will be used at different levels of attention [5] with a particular focus on peripheral interaction to support driver’s supervision and SA. As a future step, we plan to develop a multimodal and full-body interaction model combining haptic, visual and vocal interaction. According to the driver’s state (stressed, tired, etc.), it will be possible to vary the level of information to be transmitted as well as the type and number of modalities to be used in the interaction with the driver.

2 Related Work

2.1 Companions in Car

Researchers have already studied the effects on the driver of having an e-companion in the car, for example to raise trust in the autonomous systems [43] or personalize the TOR [41]. Kugurakova [23] showed that an anthropomorphic artificial social agent with simulated emotions was indeed possible. Lugano [26] has realized a review of virtual assistants in self-driving cars, showing that the idea of an e-companion is getting more attention in the industry as well.

Despite growing interest in the industry as well as in the scientific community, there was no study about an AI-Companion in the car designed to manage TOR and raise SA, at the extent of our knowledge.

2.2 TOR Modalities

In the last few years, take-over and TOR have been vastly studied. TOR modalities were shown to have an impact on take-over, opening the way to research on specific modalities and combinations of them. Especially, three categories of modalities were identified and studied intensively:

  • Visual: shown as the most ineffective modality when used alone [33], it is still a primary and instinctive ways to convey the TOR. Most systems on the market use the visual modality by default to convey information to the driver, usually not requiring urgent action from the driver. In case of more urgent situation, visual warnings come usually with an auditory warning. This is corroborated by [33], who showed that the perception of urgency was greater in multimodal warnings. Visual information can be provided to the driver in a range of ways from a simple logo, to a more articulated message or a combination of them.

  • Auditory: as the visual modality, auditory signals are greatly used in existing systems, making the visual modality a familiar modality to the driver. Different visual TOR were studied, from different abstracts sounds to speech alerts.

  • Haptic: less used modality in available cars models, haptic modality is greatly studied to convey information to the driver. It can range from many different settings of haptic seats [31] to shape-changing steering wheels [8].

2.3 Psychophysiological Model of the Driver

It is already known that the driver is the main cause of accidents on roads. Several factors can change the psychophysiological state of the driver and impact their ability to drive, up to the point of causing an accident. These factors can be from different sources, such as fatigue or drowsiness due to a monotonous drive [46], a loss of attention when being distracted from the main driving task [3, 20], or an increase of stress due to a dangerous situation on the road. This can still be applied to the context of conditionally automated vehicles: the driver may feel drowsy while monitoring the vehicle behavior for a long time, or they may experience an increase of the cognitive load while performing a secondary task. To the best of our knowledge, a global model able to detect changes regarding these states using physiological signals of the driver does not exist. A model of driver behavior for the assessment of driver’s state has been developed in the framework of the HAVEit project [34]. This model aimed at detecting driver’s drowsiness and distraction in manual driving, using driving data and features from video recording of the driver but no physiological signals of the driver. However, a lot of research has been done so far to detect these changes in driver’s psychophysiological state independently. Related work that aimed at classify driver’s fatigue/drowsiness and mental load using physiological signals is presented below.

Fatigue and Drowsiness.

Inducement of drowsiness is usually done in driving simulators for ethical reasons. Drivers are usually asked to drive for a long time for a monotonous drive on a highway, with a low traffic density sometimes in night-time environment. It has been shown that fatigue can be induced successfully to most of the participants during an experiment [35]. To control that fatigue has been successfully induced, other sources of data are used such as driving data, questionnaires (Karolina Sleeping Scale [38]) or facial features. Awais et al. [4] showed that alert and fatigued states can be distinguished using electroencephalogram (EEG) and Heart Rate Variability (HRV) features, from drowsy events detected using facial features. Authors achieved 70% of accuracy with HRV features, 76% with EEG features and 80% with both. The results achieved by Patel et al. [29] show that alert and fatigued states can be distinguished using HRV features. They achieved an accuracy of 90% in the classification of these two states from a dataset of 12 drivers. However, physiological sensors may be invasive when using laboratory equipment with electrodes. To be able to use this source of data in real-world applications, the challenge is to get acceptable performances using wearable and embedded sensors. Lee et al. [24] used wearable sensors to record electrocardiogram (ECG) and photoplethysmogram of drivers to measure and classify drivers’ drowsiness. They showed that 70% of data were correctly classified using recurrence plots and CNN-based classifier. It shows that there is some potential to use physiological signals to detect drowsiness in real-world settings and better results can be achieved.

Distraction and Mental Overload.

In the same way than the inducement of drowsiness, the effects of distraction and mental overload on drivers are usually investigated during experiments conducted in driving simulators. Different sources of distractions can be distinguished such as visual, auditory, cognitive and biomechanical distraction [32]. Throughout the experiment, drivers are asked to perform secondary tasks while driving manually. The accomplishment of these secondary tasks can increase the cognitive load of drivers and this can be measured with physiological signals. Mehler et al. [27] increased experimentally thae cognitive load of drivers by administering them a cognitive task with increasing difficulty. They observed a significant increase of mean heart rate, skin conductance level and respiratory rate when cognitive load of drivers is higher. This is consistent with findings of Ferreira et al. [15] who achieved 84% of accuracy for classifying people who performed a cognitive task on a computer. Various psychophysiological signals were used for the classification such as EEG, ECG, EDA, and respiration. In the context of manual driving, Solovey et al. [14] also achieved 75% of accuracy to classify the state of the driver with physiological and driving data from the same driver. Participants had to realize a N-Back task with auditory stimuli and oral response. When performing the classification inter-subjects, they achieved at least 80% of accuracy only with HRV features. This shows that ECG must be used to detect changes in cognitive level of drivers with our AI companion.

2.4 Interactions and Situation Awareness

A Brief Definition of Situation Awareness.

The concept of SA originated in the field of airplanes but began to develop in the automobile domain. Endsley’s model and definition are the most known and used in the literature [13]: “Situation awareness is the perception of the elements in the environment within a volume of time and space, the comprehension of their meanings and a projection of their status in near future”. Thus, the concept is divided into three hierarchical levels:

  • Perception (Level 1 SA): The driver perceives information from vehicle instrumentation, its behavior, other people into the car, other people around the car (vehicles, pedestrians, cyclists…), traffic, etc. Perception does not concern the interpretation of the data.

  • Comprehension (Level 2 SA): Comprehension is essential to understand and integrate the significance of the element. It defines the expertise of the driver.

  • Projection (Level 3 SA): Projection is the ability to forecast the future state of the element in the environment. It is the driver’s ability to solve conflicts and plan a course of action.

To summarize, the driver has to maintain the navigation, environment and interaction, special orientation and vehicle status knowledge. However, SA cannot be assessed or measured directly because it is subjective and depends on a precise element in a specific time.

Interaction in Car to Increase Situation Awareness.

As we plan to develop a multimodal and body interaction model combining one to several interactions to support driver supervision, it is important to analyze what already exists in the area of in-car interaction to increase SA. To this end, we focused our study on the different types of modalities used, where these interactions are located in the vehicle and what type of information they transmit.

The most commonly used modalities are, in order: mostly visual (use of ambient lights [12, 25], logos [40], text [28]), haptic (vibrations in different parts of the car [19, 39]), auditory (chime [42] or speech [37]), audio-visual [45]. There are also some emerging research in the field of olfactory interactions with the driver [6]. At the moment, there are few multimodal HVIs.

Concerning their location, the parts of the vehicle mainly used are central console [25], HUD/windshield [45], steering wheel [39], dashboard [40], dorsal seat [44]. Mostly these parts are used individually.

The information transmitted is also very varied. In most cases, interactions convey information about the autonomous status of the system [28], navigation, lane markings, obstacles [40], ADAS maneuvers [18], etc. Information is spread out and there is no taxonomy nor typical critical situations.

In addition, the evaluation criteria for these interactions vary greatly between all articles (driving performance data [28], quality of TOR [21], reaction time [40], task accuracy [39], gaze direction [45], usefulness, acceptability and trust [40], questionnaire for workload and situation awareness [45]).

3 Goal and Methodologies

We aim at creating an innovative driver companion fueled by AI, able to enhance the driver SA and trigger adaptive TOR while taking into account the driver psychophysiological state and the environment. In this context, we identified three major points that need to be studied and developed:

  1. 1.

    Making a take-over quality prediction Machine Learning model;

  2. 2.

    Monitoring and classifying the driver psychophysiological state;

  3. 3.

    Develop ad-hoc HMI to enhance the driver SA.

You can see our specific subgoals for each of these three objectives in the following sections.

3.1 Adaptive TOR

Our goal regarding the TOR model is to create an adaptive TOR, meaning our AI-Companion should be able to choose the modalities of a TOR on the fly, in order to maximize the quality and rapidity of a take-over. To do so, our agent will three sources of data known to impact the take-over: the driver psychophysiological data, the driver psychophysiological state and the environment. An experiment will allow us to train a Machine Learning model to predict the quality and rapidity of a take-over based on these sources of data.

Two metrics were identified to quantify take-over: rapidity and quality. For the rapidity of the take-over, the Reaction Time (RT) is a commonly used feature and it has the advantage to be fairly simple to calculate and understand [18]. Concerning the quality of the take-over, multiple studies (for example [10]) use the Max Steering Wheel Angle (MaxSWA) as an indicator of quality. A lower value of MaxSWA indicates a better quality of take-over, with less noise on the take-over process (less extra movement). This is especially true in situations where the correct action is to stop the car, opposed to situations where you can avoid the obstacle by changing lane. In both situations, lower MaxSWA still implies, earlier and more precise take-over.

A loss function consisting of both RT and MaxSWA needs to be developed in order to evaluate both. This problem can be seen as a Multi-Objective Optimization problem.

3.2 Monitoring the Driver Psychophysiological State

To train a model using Machine Learning that is able to classify different states of the driver with physiological signals, some data need to be collected. To get physiological data corresponding to the 4 states described before (alertness, attention, affective state and situational awareness) we developed and static driving simulator. The goal is to manipulate the driver’s state by means of experimental conditions in a controlled setup. To validate the success of the manipulation of the driver’s state, questionnaires are administered to participants, usually in a repeated measure design (before vs. after the experiment). Various questionnaires are used depending of the state that is manipulated such as NASA-TLX for the subjective level of cognitive load. Throughout the experiment, physiological signals of the driver are recorded. We choose to record ECG, electrodermal activity (EDA), and respiration of drivers, because they are signals that can be recorded using embedded and wearable sensors in a cockpit of a vehicle. In addition, we want to differentiate ourselves from what already exists on the market, especially with the use of a camera for facial expressions such as the Affectiva Automotive AI [1] system for driver monitoring.

Several steps are necessary to train such a model. The first step is the pre-processing. Since physiological raw signals can be noisy, there is a need to filter the raw signals and remove outliers. Then, data are segmented in more or less large time windows. For each time window, we can compute physiological indicators such as tonic EDA level, time and frequency-based HRV features computed from ECG or respiratory rate. Then comes the step of feature generation. The goal is to create features from both calculated indicators defined before and cleaned signals. Once the features are generated, the classification can be performed. Driver’s condition is classified regarding 4 states: alertness, attention, affective state and situational awareness. This is done by using machine learning with classifiers such as Support Vector Machine or K-Nearest Neighbors classifiers. An output is given by the model for each one of the 4 states. As an example, the alertness state of the driver can be classified as alert, drowsy or sleepy. In addition, an indicator that depicts the global psychophysiological state of the driver is computed.

During the whole training process, several parameters can be tested and tweaked, such as the length of timespan for segmentation, the type of classifiers (and their hyperparameters) or the number of output classes for each state. The parameters that give the best performance will be chosen.

When the model will be trained and good results are achieved, the last step will be to test the model in real-time. We will record physiological data of drivers, process the data and classify the state of the driver in real-time in order to convey the results of the classification to other modules of the AI-Companion and adapt the interaction in the car

3.3 Raising Driver Situation Awareness

The objective of this part is to assist the driver in their supervisory task while performing a non-driving related task. The main goals are to reduce their cognitive workload and keep them in the control loop by increasing their situational awareness.

For this reason, we use different modalities (haptic, visual, auditory), individually or in combination, and different areas of the vehicle (seat, external device and conversational agent). This model will transmit environmental information to the driver through these different modalities according to the situation and the driver state. The final step will be to take the driver into account and ideally their gaze in order to adapt the location, the modalities used and the level of information to transmit.

To do this, separate monomodal concepts have been developed and are currently under test:

  • A haptic seat to transmit information about near obstacles all around the car using the entire seat.

  • An Android application displaying driving related information while allowing the driver performing a secondary task on another split-screen application.

  • An ambient light display transmitting the general severity of the situation and a conversational agent the driver can interact within order to have more details about the current situation.

The objective is to search for the best combination of these different modalities in order to increase the driver’s SA.

4 Current State of Our AI-Companion

Following the related research, our previous results (as detailed in the previous sections) and the results of a workshop on explainable AI, the following design for our AI Companion was created (see Fig. 1). The next sections, will present the current state of the AI-Companion in regard of each his fundamental parts: the take-over prediction module, the Psychophysiological Model of the driver and the SA raising HMI.

Fig. 1.
figure 1

Design of our AI-Companion. P. stands for “Psychophysiological”.

4.1 Current Progress on Take-Over Prediction

Figure 2 shows what elements of the AI-Companion are concerned by the Take-over Prediction module. Here is the progress made for every one of them:

Fig. 2.
figure 2

The “Take-over prediction module” of our AI-Companion (highlighted)

  • Environmental data: Study showed the most important environmental “states” that could lead to TOR [11]. From this study, we have a clear view of what major environmental data should be monitored (for example the luminosity and the weather) in the limit of what is available through current sensors and technology.

  • Driver P. State and Data: The literature showed that Driver State/Data have an impact on take-over, and the experiment explained in the section “What we did regarding driver P. model” confirmed this.

  • Take-over Request HMI: A review highlighted the current trends and research about TOR HMI, allowing us to identify which modalities would be needed for a TOR (visual, haptic and auditory). This step defined what would be the capabilities of our AI-Companion in regard to TOR design.

  • Take-over prediction: An experiment to predict take-over quality and rapidity is ongoing and should produce results soon.

4.2 Current Progress on Driver Psychophysiological Model

Figure 3 highlights the elements of our AI-Companion concerned by the Psychophysiological Model of the driver. The following progress were made in regards to this part:

Fig. 3.
figure 3

The “Psychophysiological module” of our AI-Companion (highlighted)

  • Driver Psychophysiological Data: One experiment has been conducted in order to build the dataset. 90 participants were enrolled in for acquiring the data. Half of them had to perform an oral cognitive secondary task while the car was driving in conditional automation during 20 min, while the other half had no secondary task. 6 take-over requests were triggered to each driver. In such situations, where they had to take over control, ECG, EDA and respiration rate of participants were collected.

  • Driver Psychophysiological Model: The collected data have been used to train the model for classifying the cognitive load and the situation awareness of drivers. Currently, several parameters are being tested such as the length of the timespan for the segmentation and the type of classifiers.

  • Driver Psychophysiological State: First results show that 97% of accuracy is achieved for classifying drivers that performed the secondary task for 20 min. The data during takeover situations to classify drivers’ situational awareness are currently being analyzed.

4.3 Current Progress on Situation Awareness Raising HMI

Figure 4 shows the elements concerned by the SA raising HMI part of our AI-Companion. You can see the list of progress made to these elements in the following list:

Fig. 4.
figure 4

The “SA raising HMI module” of our AI-Companion (highlighted)

  • Environmental data: Data are directly extracted from the simulator. The data monitored are extracted from previous work [11]. For now, we mainly monitor information about weather, lane markings degradation, obstacles all around the car, road shape, general severity of the situation, etc.

  • Awareness raising HMI: Currently, we have a set of modalities spread throughout the vehicle and tested separately.

    • Haptic pan and backrest: This is currently under experiment, but the first results seem promising.

    • Ambient lights and conversational agent (idem): We indicate the severity of the situation to the driver with a series of LEDs, placed around a tablet or a smartphone. This visual modality is associated with a conversational agent. The driver will be able to interact with the conversational agent to obtain more details on the current situation and/or the condition of the vehicle (e.g., if the LEDs turn red meaning that a critical situation occurs, the driver can ask for more information on what is happening).

    • Split-screen application: A mobile application has been developed for the purpose of raising driver SA by providing extra information about the car, with promising results.

  • Driver Psychophysiological State: not yet taken into account, but will be added in future experiments.

5 Future Research

For our next steps, we have three main goals:

  1. 1.

    Validation of the AI-Companion as a whole. Up to now the AI-Companion has been validated part by part. The next step is to group the three modules and design a validation scenario to demonstrate the effectiveness of the integrated solution.

  2. 2.

    HMI integration. After this validation, alternative feedback loops could appear and be the subject of experiments. We are especially looking at sharing the outputs of each HMI among them before interacting with the driver, allowing more control of our AI-Companion over its two HMI instead of controlling them separately.

  3. 3.

    Real-time framework. Lastly and from a more technical perspective, we want to demonstrate that all the data processing needed for the inference step can be done in real-time and on the edge with reasonably limited hardware.

6 Conclusion

In this paper we have presented the design of an AI-Companion for highly autonomous cars. The companion objective is twofold: raising the driver SA and designing TOR on the fly in case of take-over situations. The realization of the Companion is still ongoing but first results are promising. Machine Learning solutions allow analyzing multiple sources of data and detecting changes in the user state. Such information is used to adaptively select the best modality of interaction to interact with the user.

We hope that this paper will resonate in the scientific community and in the industry, allowing us all to work together toward safer driving.