Keywords

1 Introduction

Two important requirements of today’s cars are a high level of safety and connectivity with the outside world. This involves the use of advanced technologies based on a computing infrastructure composed of numerous electronic components –named Electronic Control Units (ECUs)– embedded inside the vehicle. These ECUs are in charge of processing sensed data through embedded sensors, and transforming them into commands for the actuators. For this purpose, ECUs share communication buses. These are used for periodic and event-based messages that allows the ECUs to monitor the vehicle state through the control and supervision of sensors and actuators states. The communication bus mostly used in the automotive domain is the Controller Area Network (CAN, ISO 11898), which connects together many ECUs.

Recently, the CAN protocol has become the center of multiple cyber-security issues [2, 4]. In this context, Hoppe et al. [7] were the first researchers to point out the weaknesses of the CAN bus. These findings were further investigated and confirmed by Koscher et al. [11] and Checkoway et al. [2] who performed frame replay and frame injection attacks on a real vehicle. In these attacks, the attacker physically connects to the CAN network and replays or injects messages on the CAN bus. Miller and Valasek [15] showed that physical access to the communication bus was not necessary and showcased an attack granting remote control over a vehicle. In their experiments, the attacker remotely takes control of a legitimate ECU and use that ECU to send legitimate messages.

To protect against these attacks, multiple solutions have been proposed:

  • Protecting the messages payload can be a good approach against an attacker that has physical access to the communication bus. Nilsson et al. [18] proposed to send message authentication codes over consecutive CAN frames to authenticate the messages. Hartkopp et al. [6] proposed to use Cipher based Message Authentication Code (CMAC) as a symmetric authentication measure between the sender and the receiver. These types of solution allow the receiving ECU to verify the integrity and/or the authenticity of the messages and to filter out forged information sent by the attacker (which is unauthentic).

  • A second family of protection solutions is known as in-vehicle network Intrusion Detection and Prevention Systems. The role of these systems is to monitor the in-vehicle network for suspicious behavior like frame(s) injection and replay attacks and either physically kill suspicious frames by causing a frame error or by filtering them out. Examples of such detection mechanisms are presented for instance in the work of Taylor et al. [20], and the work of Marchetti et al. [14]. In general, state-of-the-art detection mechanisms can be categorized into two main classes: rule-based detection mechanisms and statistical detection mechanisms. We investigate more in details these types of solutions in Sect. 2.1.

  • Another type of protection solution, specific to the CAN bus focuses on protecting the identifier. These solutions are useful to protect against reverse engineering, replay and injection attacks for an attacker that has physical access to the CAN network. For instance Humayed et al. [8] presented a solution that can change a message identifier when an attack is detected, thereby stopping the targeted attack dead. Han et al. [5, 12] proposed an identifier randomization function for the same purpose.

In the sequel we focus on in-vehicle intrusion detection techniques. State-of-the-art rule-based intrusion detection uses mechanisms known as identifier filtering, identifier timing and syntax check. Some of them also focus on payload content and implement what is known as deep packet inspection techniques.

Contributions. In this paper, we tackle the problem of deep packet inspection of in-vehicle networks from a practical viewpoint. For an attacker that gains control over an ECU, we consider that her capacity evolves from simply injecting an extra message on the communication bus, to being capable of modifying the content (payload) of a legitimate message. This evolution makes the classical detection mechanisms, based on identifier timing and syntax check, merely obsolete. In order to detect these kinds of attacks, a novel detection mechanism is developed. We formulate the problem in a way that allows to learn the normal behaviour of the system in terms of message payload content. Bad behaviour and bad payload content are flagged with outlier detection techniques. The method thus described can be adopted not only as an intrusion detection mechanism, but also as an online monitoring failure detection and a sensor rationality check safety mechanisms as described by the “Road vehicles – Functional safety” standard ISO-26262 [9]. We validate in practice the model with real CAN traces collected from drive tests. We show that the approach is able to learn the nominal behavior with high accuracy and low false positives, for three different driving behaviors separately. Then we show that it is also able to learn a unified nominal behavior with high accuracy and low false positives, that can accommodate different driving behaviors. Finally we run an attack campaign in order to test the robustness of the detection rules, and demonstrate its ability to predict attacks with low false negative rate.

Outline. The remainder of the paper is structured as follows. Section 2 gives some background on CAN intrusion detection mechanisms, machine learning techniques and the related work. Section 3 gives details about data collection and feature engineering. Section 4 presents practical validation results on real CAN traces. Section 5 concludes.

2 Background

2.1 Intrusion Detection Systems over CAN

Detecting intrusions on the in-vehicle communication buses is important as it can prevent attacks from spreading to other ECUs. It can be considered as the last line of defense after protecting ECUs interfaces from the outside world. Many mechanisms have been proposed to detect possible intrusion on the CAN bus. Figure 1 gives a high level overview of these mechanisms.

Fig. 1.
figure 1

High level synthesis of detection mechanisms applied to the CAN frame

Using the frame identifier, an intrusion detection system can establish a list of allowed and forbidden identifiers, based on which it can decide which frames to filter. This technique is best known as identifier filtering or identifier white-listing [15, 16]. Such white list can also depend on the context of the vehicle: for instance the intrusion detection system may allow certain identifiers when the vehicle is on parking state, and reject them when the vehicle is moving. This technique is used in particular to enforce the diagnostic security policy by allowing diagnostic messages only in certain vehicle states.

Another detection mechanisms that uses identifiers is timing analysis [3, 7, 16]. It is a very popular technique that works well with periodic messages. It consists in setting an acceptance time-window for each periodic message. If the same message is received outside of its acceptance time-window, the system shall consider it as an intrusion and shall filter it out.

Besides the identifier of the messages, the data length code (DLC) can also be exploited to detect bad behaviour [16]. In fact, each manufacturer sets-up a proprietary protocol over the CAN standard. This protocol consists in creating a mapping between identifiers and payload information (sensor values for instance), also called signals, shared across all ECUs. This mapping defines a syntax that can be checked based on the payload length of each message. Messages that violate this syntax (i.e., messages sent with the wrong DLC) are then flagged as intrusions.

In this paper we distinguish between two attacker models (Fig. 2). Figure 2a shows an attacker model that has direct physical access to the CAN bus. Since modification of a message on the fly is rather difficult (the message being protected with CRC mechanism), this attacker instead injects extra messages on the CAN bus. These messages will modify the proprietary communication protocol defined on top of CAN for instance by modifying the syntax of the message or its periodicity. These anomalies are caught by the classical detection mechanisms described previously. Therefore, an advanced attacker who has indirect and even remote access over a legitimate ECU (Fig. 2b), might aim at modifying sensor information and commands directly on the payload without disrupting the defined protocol. Thus will not be detected by above-mentioned classical detection mechanisms. Consequently, we need to build mechanisms able to detect bad behaviour inside the payload. These mechanisms are referred to as deep packet inspection. The latter encompasses most safety checks. For instance, duplicated signals, process counters, checksum ...In this paper, we focus on deep packet inspection type of detection, as this detection mechanism is well adapted to sophisticated attacker model. Supervised machine learning techniques are used in order to build a nominal behaviour based on received signals; then outlier detection flags deviations from the previously built behavioral model.

Fig. 2.
figure 2

Attacker models. (a) State-of-the-art model. (b) Model investigated in this paper

2.2 Machine Learning Algorithms and Their Application

In practice there are multiple application domains where machine learning algorithms excel in prediction tasks. They are generally used to study correlation between different inputs (also called features), to approximate an output function and/or to discover interesting data structures. For these reasons we decided to explore the use of machine learning techniques in the context of vehicle cyber-physical attacks and intrusion detection.

Machine learning algorithms can be divided into two main categories depending on the learning strategy:

  1. 1.

    Supervised learning: a machine learning algorithm is said to be using supervised learning strategy when the training set includes both the input data and the output data of the algorithm. In that sense the algorithm is training to learn a mapping function by minimizing a pre-defined cost function. The trained algorithm is then tested on some other examples that were not included in the training set. It is said to be generalizing well if the performance of the trained algorithm on the test set is comparable to its performance on the training set.

  2. 2.

    Unsupervised learning: a machine learning algorithm is said to be using unsupervised learning strategy if the training only includes the input data but not the expected output. In that sense, the machine learning algorithm is trying to discover interesting data structures.

Machine learning techniques have been used previously in the context of deep packet inspection for intrusion detection. Kang et al. [10] train a deep neural network structure to classify normal versus attack packets using probability-based feature vectors of packet payload bits. Training data were generated by the Open Car Test-bed and Network Experiments (OCTANE) packet generator [1]. Normal and attacked packets were necessary in order to train the algorithm. Loukas et al. [13] use sensor input features along with recurrent neural network (RNN) to detect attacks on vehicles. The detection mechanism also consists in learning to classify whether the vehicle is under attack or not with a training data that included both attacked and normal packets. An important limitation of the work of Kang et al. [10] and Loukas et al. [13] is that the intrusion detection system is trained to recognize specific attacks. An important effort is devoted to generate attacked packets in order for the detection module to learn the attack profile. Taylor et al. [21] use long short-term memory networks to detect attacks on the CAN bus. The approach was applied to the identifier, and learns to predict the next packet identifier on the CAN bus. Highly surprising bits are then flagged as anomalous. This method draws its strength from repetitive periodic sequences. This is why it is applied to the identifier field. Nevertheless, this is hardly the case for payload information that holds sensor information. Narayanan et al. [17] propose to build Hidden Markov Model of the normal behaviour of the car based on sensor values (or signals). Their work shows that it is possible to detect data manipulation attacks like speed discontinuity. In their work, Narayanan et al. focus on signal changes rather than signal values, i.e., gradients of signals. As a result, the built model can serve to detect signal jumps types of anomalies and cannot be used for prevention. Besides, their work does not evaluate the True Positive Rate and False Positive Rate of the detection principle.

An important limitation of the previous approaches is that during training, data representing both attacked and non-attacked states is needed to learn to recognize attacks. In order to produce this kind of data we need to select and perform multiple attacks on the vehicle. Thus it is challenging to generate the data for a large range of attacks. Besides, the intrusion detection system learns only to recognize performed attacks included in the training set. Another downside is that the approach allows only to predict whether the vehicle is under attack or not but does not deliver more detailed information useful to investigate on the cause of the attack.

In order to overcome these limitations, we propose a different formulation of the problem. In fact, instead of predicting whether the vehicle is under attack (or not) based on payload inputs, we break down the payload information into signals according to the manufacturer proprietary protocol and we train a machine learning algorithm to predict the next signal value based on other signals. The idea is then to compare the predicted signal and the received signal. Under the assumption that the predictor is accurate enough, we assume the following as a security metric: if the difference between the prediction and the received value is large enough, then, with a high probability, the vehicle is being attacked and that the predicted signal is the potential cause of the attack.

Input signals are sensor values sent from one ECU to the other ones. They can either be real-valued or categorical signals:

  • An example of real-valued signal is the speed of the vehicle (Fig. 3a). It is sent over 2 bytes of payload information. The received value is then an integer between 0 and 65535. A multiplication by 0.01 is necessary to recover the actual measurement of the sensor to make speed range in [0, 655.35] km/h.

  • An example of categorical signal is the brake lights command signal (Fig. 3b). It is sent over 1 bit of payload data. The received value is a binary information (0/1) indicating whether to activate the brake lights (1) or not (0).

Fig. 3.
figure 3

Example of real-valued and categorical signals

2.3 Problem Formulation

In what follows we formulate our problem as a supervised machine learning problem. Let \(\mathcal {D} = \{ (x_i,y_i) \}_{i \in [1,N]}\) be the set of input-output pairs. Here \(\mathcal {D}\) is the collected Data set, and N is the number of observed examples. Each training input \((x_i)_{i\in [1,N]} \) is a d-dimensional vector of components representing signal values/states \((s_i^{(1)},s_i^{(2)},\ldots ,s_i^{(d)})\). These are called features and are stored in an \((N \times d)\) matrix X (Fig. 4). The output \((y_i)_{i\in [1,N]}\) is stored in a 1-dimensional vector y and represents the target signal that we want to predict. It can be either real-valued (in this case we will talk about regression) or a categorical value (in which case we will talk about classification), depending on the signal type.

The object of supervised machine learning is to assume the existence of some unknown function \({<}f{>}\) that maps the inputs to the outputs, as in (1):

$$\begin{aligned} f(x) = y, \quad \forall (x,y) \in \mathcal {D}.\end{aligned}$$
(1)

The goal of the learning process is to estimate the function \({<}f{>}\) given a labeled training set and then to make predictions on unseen data \(x_u\) using the estimated function \(\hat{y}=\hat{f}(x_u)\). We denote the probability distribution over possible labels, given the input vector \(x_u\) and the training data set \(\mathcal {D}_{train}\) by \(p(y|x_u,\mathcal {D}_{train})\). This probability is conditional on the input vector \(x_u\) and the training set \(\mathcal {D}_{train}\). When approximating the function \({<}f{>}\), we will use a machine learning model \(M_{\theta }\), where M is the model, and \(\theta \) denotes the parameters of the model. The probability distribution over possible labels becomes also conditioned by the chosen model, \(p(y=\hat{y} | x_u,\mathcal {D}_{train},M_{\theta })\).

When using regression parametric models, we assume that the estimated function used for the prediction introduces a residual error \(\epsilon \) between the predictions and the ground truth:

$$\begin{aligned} y = \hat{y} + \epsilon .\end{aligned}$$
(2)

We make the assumption that the residual error term \(\epsilon \) has a Gaussian normal distribution, \(\epsilon \sim \mathcal {N}(\mu ,\sigma ^2)\). More explicitly we will assume that the probability distribution over possible labels is as follows:

$$\begin{aligned} p(y | x_u,\mathcal {D}_{train},M_{\theta }) = \mathcal {N}(\mu _{\theta }(x_u), \sigma ^2).\end{aligned}$$
(3)

In order to estimate the model parameters \({<}\theta {>}\), we use the maximum likelihood estimator that maximizes \(p(\mathcal {D}_{train} | \theta ) =\prod _{i=1}^N p(y_i|x_i,\theta ) \). It is equivalent to finding the model parameters \(\hat{\theta }\) that minimizes the negative log-likelihood which is the sum of residual errors \(\sum _{i=1}^N (y_i - \hat{y}_i)^2 = \sum _{i=1}^N \epsilon _i^2\):

$$\begin{aligned} \hat{\theta } = \underset{\theta }{\text {argmin}} \sum _{i=1}^N (y_i - \hat{f}_{\theta }(x_i))^2. \end{aligned}$$
(4)

Once optimal parameters \(\hat{\theta }\) are estimated, the prediction model outputs a predicted signal estimation \(\hat{y}_u = \hat{f}_{\hat{\theta }}(x_u)\) for an unseen input vector \(x_u\). The received signal value y is then compared to the estimated signal value. An alert is raised if the two signals are not similar

$$\begin{aligned} Alert = 1 \ \iff \ |\hat{y} - y| \ge t_p. \end{aligned}$$
(5)

When using classification parametric models, where the output is one out of C classes, we model the probability over possible labels with a categorical distribution. Let \(y_{ij} = I(y_i =j)\) be the one-hot encoding of \(y_i\):

$$\begin{aligned} p(y|x_u,\mathcal {D}_{train},M_{\theta }) = \prod _{j=1}^C \mu _{\theta ,j}(x_u)^{I(y = j)}.\end{aligned}$$
(6)

In order to estimate the model parameters \({<}\theta {>}\), we use the maximum likelihood estimator that maximizes \(p(\mathcal {D}_{train} | \theta )=\prod _{i=1}^N p(y_i|x_i,\theta ) = \prod _{i=1}^N \prod _{j=1}^C \mu _{\theta ,j}(x_i)^{I(y_i = j)}\). This is equivalent to minimizing the negative log-likelihood which is the cross entropy function:

$$\begin{aligned} \hat{\theta } = \underset{\theta }{\text {argmin}} \sum _{i=1}^N \sum _{j=1}^C y_{ij}{\log }(\mu _{\theta ,j}(x_i)). \end{aligned}$$
(7)

Once we have the optimal model parameters \(\hat{\theta }\), for each unseen input vector \(x_u\), we make a prediction in favor of the class where the probability distribution is the highest: \(\hat{y}_u =\underset{j\in [1,C]}{\text {argmax}} (\mu _{\hat{\theta },j}(x_u)) \).

Once optimal parameters \(\hat{\theta }\) are estimated, the prediction model outputs a predicted signal estimation \(\hat{y}_u = \hat{f}_{\hat{\theta }}(x_u)\) for an unseen input vector \(x_u\). The received signal value y is then compared to the estimated signal value. An alert is raised if the two signals are not similar:

$$\begin{aligned} Alert = 1 \ \iff \ \hat{y} \ne y. \end{aligned}$$
(8)

3 Data Collection and Feature Engineering

3.1 Data Collection

In order to provide training vectors, the best way is to collect data directly from a real vehicle. For this purpose we prepared a CAN acquisition device. The device is composed of a Raspberry Pi with additional CAN-Bus hardware module running a Linux kernel with SocketCAN drivers. We equipped a vehicle with the acquisition device connected directly to different CAN buses in order to have direct access to all sensor information, although not all of them will be used during training. We collected CAN traces from one vehicle for three different drivers, driving in different circuits for about 90 min each. Circuits consisted of multiple driving conditions including city driving, vehicle parking, highway driving, etc. During those data collections, drivers were asked to drive normally but also to perform rare but legitimate scenarios like activating cruise control, activating lane keep assist, activating emergency breaking, etc. For safety reasons no attacks were performed during data collections step.

3.2 Feature Engineering

After raw data acquisition, the second step consists in preparing the data for processing. In this step, the goal is to select and arrange the features in a form that would be useful during training step. Each CAN identifier sent over the CAN bus has a payload that is composed of one or multiple signals. A signal is an information (sensor value, ECU state, counter, checksum, ...) that can occupy one or multiple bits or bytes depending on the nature of the information. Extraction of signals requires the knowledge of the proprietary protocol of the car manufacturer. Signals included in the payload for safety reasons, like checksums, process counters, duplicated signals, are checked by safety functions and problems with those signals, if any, would be handled by appropriate safety mechanisms. Thus, they are not relevant for this task and therefore are not selected. Typically we are interested in physical sensor values like speed, acceleration, RPM, etc. The set of those signals defines the state of the vehicle and constitutes the input features that are relevant for learning the normal behaviour and evolution of the car states. The second selection criteria is the relevance with respect to the target signal. In fact, the dimensionality of the training vectors equals the number of selected signals. However, in general, machine learning algorithms do not work well with high dimensional inputs. Indeed, as input vectors dimensions grow, the performance deteriorates, due to the curse of dimensionality. As a result, we choose to select only signal with high correlation with the target signal. For instance, the engine oil temperature has no influence on the vehicle speed, thus would not be selected when building a predictor for the speed signal. On the other hand, the acceleration of the vehicle is highly correlated to the speed of the vehicle, thus will be selected as an input to predict the speed. Using this selection criteria we can guarantee that signals that can explain the most the target signal are used for prediction. Signals are featured in the form of a matrix where columns represent signals and lines represent signal values evolution over time. For each received CAN message that holds selected signal, a new line is added to the matrix where all signals keep their previous values/states except the one that has just been received. Figure 4 gives more details about how to construct the features matrix.

Fig. 4.
figure 4

Parsing the log file and building the training data.

4 Experimental Validation and Discussion

In order to validate the approach, we conduct some experiments to predict two target signals, one of each type (categorical and real-valued), using five selected input signals. To this end, a total of six signals are extracted. For each target signal, the remaining five are used as input features.

  • Speed, is a real-valued signal sent from the Electronic Stability Program (ESP) and that is generated by an embedded speed sensor.

  • Acceleration, is real-valued signal that is sent from the Electronic Stability Program (ESP) and generated by an acceleration sensor.

  • Engine rotational speed expressed in revolutions per minute (RPM), is a real-valued signal sent by the Engine Control Module (ECM).

  • Torque, is a real-valued signal sent by the Engine Control Module (ECM) that contains the engine torque.

  • Gearbox position, is a categorical signal sent by the Electronic Shifter Module (ESM), that indicates the gear lever position.

  • Brake lights command is a categorical signal that is sent from the Electronic Stability Program (ESP) module to control brake lights.

Experimental validation is conducted in two steps. First we train and evaluate the detection rules using collected data and without performing any attacks. This step gives us the True Negative rate, that we define hereafter as the accuracy (Acc) of the supervised learning algorithm, which will be formally introduced in the Sect. 4.1. The False Positive rate is then derived from the accuracy and equals (\(1-Acc\)). Then we conduct an attack campaign and measure how many of the performed attacks are detected. This step gives us the True Positive rate and the False Negative rate. Table 1 defines the metrics that will be used in the sequel.

Table 1. Detection metrics

4.1 Validation Metrics

Regression Metrics for Real-Valued Signals: The accuracy (denoted as Acc) of a machine learning prediction algorithm is generally measured using the coefficient of determination \(R^2\). The \(R^2\) coefficient of determination is a statistical measure of how well the regression predictions approximate the observed target values. The closer it is to 1, the more accurate the prediction is. An \(R^2\) of 1 indicates that the regression predictions perfectly fit the data. We can express the prediction accuracy with the following:

$$\begin{aligned} Acc = R^2 = 1 - \frac{\sum (\hat{y}_i - y_i)^2}{\sum (y_i - E(y_i))^ 2} = 1 - \frac{\sigma _{\epsilon }^2 + \mu _{\epsilon }^2}{\sigma _{y}^2}, \end{aligned}$$
(9)

where \(\sum (\hat{y}_i - y_i)^2\) is the residual sum of squares, \(\sum (y_i - E(y_i))^ 2\) is the total sum of squares, \(\sigma _{\epsilon }^2\) and \(\mu _{\epsilon }^2\) are respectively, the standard deviation and mean of the error term, and \(\sigma _{y}^2\) is the standard deviation of the target signal y.

Intuitively, comparing the quality of the predictors can be based on the mean and variance of the prediction error \(\epsilon \). Ideally the error has to be centered around zero (unbiased predictor) with the smallest possible variance.

To define an intrusion detection system based on the predictor we need to define an acceptable deviation of the prediction that can be tolerated. Beyond this acceptable deviation, the received signal can be considered way off compared to the prediction and an alarm should be raised. This acceptable deviation or detection threshold \(t_p\) for the predictor defines the false positives statistically generated by the predictor (red bars in Fig. 6). More formally we can define the false prediction, as follows:

$$\begin{aligned} FP_{t_p}(y,\hat{y}) = \left\{ \begin{array}{ll} 1 &{} \text {if } |y-\hat{y}| \ge t_p, \\ 0 &{} \text {if } |y-\hat{y}| < t_p. \end{array} \right. \end{aligned}$$
(10)

Tweaking this parameter \(t_p\) helps increase/decrease the false positives probability of the intrusion detection rule that will be defined based on this predictor. The new accuracy measure with respect to \(t_p\) becomes \(Acc_{t_p} = P(|\epsilon |<t_p)\) (Fig. 5).

Fig. 5.
figure 5

Prediction principle

Fig. 6.
figure 6

Gaussian shaped prediction error (Color figure online)

Classification Metrics for Categorical Signals: The default accuracy metrics used in machine learning classification tasks is the correct classification ratio:

$$\begin{aligned} Acc = \frac{\#\text { correct predictions }}{\#\text { use-cases }} \end{aligned}$$
(11)

Unlike regression, for classification it is straightforward to define a false prediction which in this case is simply a mis-classification. More formally we can define the mis-classification function as the following:

$$\begin{aligned} MC(s,p)= \left\{ \begin{array}{ll} 1 &{} \text {if } class(s) \ne class(p),\\ 0 &{} \text {if } class(s) = class(p). \end{array} \right. \end{aligned}$$
(12)

4.2 Predicting a Real-Valued Signal: Speed

For regression problems, we chose to validate the approach we described in previous sections on a signal that is important from a safety standpoint. The speed information is sent by the Electronic Stability Program over the CAN bus for the other ECUs to be used in other functions. Besides being displayed for user-information, it is used to compute the effort to be applied on brakes when emergency brakes are activated, to decide when to activate airbags in case of an accident, also to decide if the car doors should be open or closed, and whether or not to accept diagnostics commands and a lot of other functions. In the performed experiments, the goal is to compare between different machine learning algorithms, as each algorithm has a different way of capturing dependencies between input features and the target signal. We used a data set of \(10^6\) input vectors from each drive test. The data set was split into a training set and a test set of 0.7 and 0.3 size ratio respectively. All experiments are done with the scikit-learn library [19].

In the first experiment, we train and evaluate detection rules for each driver separately. We used four types of machine learning algorithms: k-nearest neighbors (KNN), Decision Tree, neural network with logistic perceptron and neural network with rectified linear unit (Relu) perceptron. For each type of machine learning algorithms, we used different tuning parameters to progressively give them the ability to capture more complex dependencies, but also that increase the complexity of the learning algorithm. For instance, this consists in increasing the depth of a decision tree or in increasing the number of neurons and layers for neural networks. Table 2 reports evaluation metrics of the tested algorithms.

First, we note that the results of KNN is merely provided as a baseline. In fact using KNN is advantageous as it gives a very precise local approximation for dense and uniformly distributed training set. It is nevertheless not useful in the context of embedded systems as it needs all the training data in memory in order to make a prediction. Second, each algorithm performs approximately similarly on the three drivers. Third, for a given algorithm, we note that as we increase the complexity (tuning parameters) of the learning algorithm, the accuracy improves. The rule becomes progressively able to capture more dependencies. As a result, it becomes necessary to take into consideration the added complexity compared to the gain in accuracy. For the decision tree algorithm, changing the tree depth from 20 to 40 does not improve significantly the accuracy. Similarly increasing the number of neurons in the Logistic-Neural-Network up to 80 neurons, and increasing the number of layers in the Relu-Neural-Network up to 10 layers does not have a significant effect on the accuracy for all three drivers. We conclude that as the complexity of the algorithm increases, its ability of capturing more dependencies also increases, but reaches a a certain limit beyond which it is no longer advantageous to increase the complexity. Overall, and for all three drivers, we can establish that the best results were reported for the decision tree algorithm tuned with depth parameter equals to 40.

Table 2. Prediction accuracy of detection rules for \(tp= \pm \)5 km/h trained and tested with data captures from three different drive tests

4.3 Predicting a Categorical Signal: Brake Lights Command

For classification problem, we choose to validation the approach on the brake-lights-command categorical signals. In order for the accuracy metric to make sense, test data should be balanced, i.e., the number of test vectors should be roughly the same for each class. Results are reported in Table 3.

A similar test procedure was also used for the brake-lights-command signal. We notice that there are small differences in the accuracy for the same rule when comparing between different drivers. In fact, practically all the tested rules perform better on the first and second driver than on the third driver. An explanation of this result might be that the third drive test contained singular use-cases that did not appear frequently enough, thus the rules did not train well enough in order to recognize them. An easy solution to overcome this limitation is to collect more data for these specific use-cases. We also notice that the decision tree algorithm tuned with depth parameter equals to 40, reported the best performance for all three drivers.

Table 3. Prediction accuracy of detection rules for the brake-lights-command signal

4.4 Unification of Detection Rule

In the previous section, we reported results on the accuracy of the predictors trained and evaluated for each driver separately. The resulting detection rules could be influenced by the driving behaviour of the driver. In this section we investigate the possibility of building one single detection rule that can accommodate all three drivers. According to the previous results, the Decision Tree algorithm outperforms the rest of the algorithms for both predicted signals. Thus, we use Decision Tree algorithm to build the detection rules in this section. In order to train the algorithm we combine the data sets collected during the three drive tests and we split the resulting data set into 0.7 and 0.3 ratio training set and test sets. We report results of the accuracy on the test set as well as on the three data sets separately for the speed signal in Table 4 and for brake-lights-command in Table 5. Results show that, for both signals, the resulting detection rules have a high accuracy level on the combined data set as well as on data from each individual driver. This shows that it is possible to build a single detection rule that can accommodate the three drivers.

Table 4. Prediction accuracy of the unified detection rules for the speed
Table 5. Prediction accuracy of the unified detection rules for the brake-lights-command

4.5 Evaluation Against Attacks

In order to evaluate the effectiveness of the detection rule, we conduct a test campaign against simulated attacks. Since we claim that our model can detect attacker that has full control over one of the ECUs (Fig. 2b), the simulated attacks consist in replacing the data content of the messages with an attacked content. Thus the attacker is showcasing a Man-in-the-middle attack between the signal generator (sensor) and the receiver ECU on which we install the intrusion detection system.

Attacks Against Real-Valued Signal: For the speed signal monitoring we perform three types of attacks:

  • Random speed injection: in this attack, the attacker substitutes the real sensor value with a random value.

  • Speed offset injection: in this attack, the attacker adds to the real speed sensor value an offset value.

  • Speed Denial of service (signal drop): in this attack, the attacker interrupts the sending of the frame causing the speed signal to freeze at the last sent value.

Fig. 7.
figure 7

Alerts raised by the decision tree (depth = 40) detection rule tested on three different attacks on the speed signal. On top is the ground truth and attacked signals: the blue signal represents the ground truth sensor value, the red signal is the attack signal. On the bottom is the alerts raised by the detection rule when receiving the attack signal. (Color figure online)

Figure 7 shows the attack use-cases on the speed signal. Note that the detection rule is set to raise an Alert as long as the received speed value (injected by the attacker) is outside the acceptance interval of ±5 km/h of the predicted speed value. Thus we consider that an attack is happening if the injected speed signal is outside of this acceptance interval. We can see from the Alerts raised by the detection rule that:

  • For the random speed injection attack: as long as the injected speed value is outside the acceptance window, alerts are raised. The alert is not raised when the injected speed value is close to the ground truth value. We obtained 0.13% of false negatives when performing this attack.

  • For the speed offset attack, we can see that, the alert is raised as soon as the attack started. In fact, since the speed offset of the attack is set to +40 km/h, the received signal is always outside the acceptance window. The detection in this case is perfect and we obtained \(5.8 10^{-5}\)% of false negatives.

  • For the Denial of service attack, the same reasoning applies. The injected speed is frozen at around 20 km/h, which means that most of the time the alarm is raised as the received speed is outside the acceptance window. But as soon as the ground truth speed value approaches the injected value, the alarm turns off. We obtained 0.19% of false negatives on this attack.

Attacks Against Categorical Signal: For the brake-lights-command signal monitoring we perform three types of test:

  • Random command injection: in this the attack, the attacker substitutes the real command with a (0/1) random command.

  • Inverse command injection: in this attack, the attacker inverts to the real command.

  • Denial of service (force to 0): in this attack, the attacker always sends the 0 command value.

Fig. 8.
figure 8

Alerts raised by the decision tree (depth = 40) detection rule tested on three different attacks on the brake-lights-command signal. On top is the ground truth command, in the middle is the attack command and on the bottom is the Alerts raised by the detection rule when receiving the attack signal.

Figure 8 shows the attack use-cases on the brake-lights-command signal. Note that the detection rule is set to raise an Alert as long as the received command value (injected by the attacker) differs from the predicted command. Thus we consider that an attack is happening if the injected command signal is different from the real brake-lights-command signal. We can see from the Alerts raised by the detection rule that:

  • For the random command injection: as long as the injected command differs from the ground truth command, alerts are raised. The alert is not raised when the injected and ground truth commands are the same. We obtained a false negative rate of 0.98%.

  • For the Inverse command attack, we can see that, the alert is raised as soon as the attack started. In fact, since the injected command is always the opposite of the ground truth command, the predicted signal is always different from the received signal. Thus an attack is detected from the start, and we obtained a false negative rate of 1.67%.

  • For the Denial of service attack, the injected command is set to 0. The ground truth brake-lights-command have occurrences of about 70% and 30% for 0 and 1 respectively. Thus, we consider that there is an attack only 30% of the time. Similarly, the alerts were raised when the injected command differs from the ground truth command. We obtained a false negative rate of 0.4%.

5 Conclusion

In this article we introduced a novel in-vehicle intrusion detection system capable of detecting an attacker with full control over an ECU. This intrusion detection system is based on detection rules built with supervised machine learning techniques. The rules learn nominal behavior of the system and make predictions for individual signal value. Alarms are raised when the predicted signal value is not similar to the received value. We showed first the effectiveness of the detection rules for separate drivers, then for a small set of drivers. We also showed the effectiveness of the detection rules against examples of attacks. The advantage of the proposed method relatively to previous work is that it only needs collected data to learn nominal behavior, and does not need examples of attacks in order to recognize them. Plus, it gives the ability to target individual signals (for instance most safety critical). Since the detection rules are actually signal predictors, theoretically the approach could be used for prevention as well. One may consider the false positive rate of 1% not low enough given the high number of frames used within the communication buses. For this purpose we can account for successive alerts as a remedy. In fact, in order to effectively influence the behavior of the car, the attacker needs to send successive attack frames. Thus, we can consider that an isolated detection alert could be ignored, and focus on successive alerts. This technique can tremendously reduce the number of false positives.