Data-Driven Detection of Anomalies and Cascading Failures in Traffic Networks

Traffic networks are one of the most critical infrastructures for any community. The increasing integration of smart and connected sensors in traffic networks provides researchers with unique opportunities to study the dynamics of this critical community infrastructure. Our focus in this paper is on the failure dynamics of traffic networks. By failure, we mean in this domain the hindrance of the normal operation of a traffic network due to cyber anomalies or physical incidents that cause cascaded congestion throughout the network. We are specifically interested in analyzing the cascade effects of traffic congestion caused by physical incidents, focusing on developing mechanisms to isolate and identify the source of a congestion. To analyze failure propagation, it is crucial to develop (a) monitors that can identify an anomaly and (b) a model to capture the dynamics of anomaly propagation. In this paper, we use real traffic data from Nashville, TN to demonstrate a novel anomaly detector and a Timed Failure Propagation Graph based diagnostics mechanism. Our novelty lies in the ability to capture the the spatial information and the interconnections of the traffic network as well as the use of recurrent neural network architectures to learn and predict the operation of a graph edge as a function of its immediate peers, including both incoming and outgoing branches. Our results show that our LSTM-based traffic-speed predictors attain an average mean squared error of 6.55 × 10−4 on predicting normalized traffic speed, while Gaussian Process Regression based predictors attain a much higher average mean squared error of 1.78 × 10−2. We are also able to detect anomalies with high precision and recall, resulting Sanchita Basak et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. in an AUC (Area Under Curve) of 0.8507 for the precisionrecall curve. To study physical traffic incidents, we augment the real data with simulated data generated using SUMO, a traffic simulator. Finally, we analyzed the cascading effect of the congestion propagation by formulating the problem as a Timed Failure Propagation Graph, which led us in identifying the source of a failure/congestion accurately.


INTRODUCTION
Since the emergence of smart cities, a major focus has been in the area of Intelligent Transportation System. These systems provide researchers with unique opportunities to study the dynamics of road traffic. In this paper, we study the failure dynamics of traffic networks, focusing on the detection and diagnostics of traffic anomalies based on traffic-prediction models. Traffic predictions can be performed based on two different approaches: model-driven and data-driven [Barros, Araujo, and Rossetti (2015)]. In model-driven approaches, we have a physical model that represents the network topology, incorporating information about intersections, road segments, signals, geographical coordinates of Traffic Message Channel (TMC), etc. In data-driven approaches, information regarding various forms of traffic measurements, such as speed and congestion factor, are needed for training, which can be obtained from sensors, such as induction-loop detectors placed in the road network.
Our aim here is to combine model-driven and data-driven approaches to build an effective traffic prediction architecture. We use the physical model of the network to generate a directed graph that captures the spatial interconnections within the network. The temporal dependencies of the flow patterns are captured by training recurrent neural network architectures using significant amounts of sensor data. Thus, combining the model-driven and data-driven approaches, we can assess the evolution of the traffic state of the entire road network. We demonstrate our approach using real traffic data from Nashville, TN, USA obtained via the HERE API [HERE Developer (2019)]. In particular, we study the efficacy of building traffic-speed predictors using two different approaches, Long-Short Term Memory Networks (LSTMs) and Gaussian Process Regression (GPR). For both approaches, we model the speed of each road segment in the network as a function of its neighboring road segments, and build specialized traffic predictors for each edge of the entire network.
We develop the traffic speed prediction model keeping two objectives in mind: 1) detection of anomalous sensor readings and 2) a model to capture the dynamics of congestion propagation in a cascaded way. The disruptive events in the traffic network causing anomalous sensor readings can be due to malicious sensor attacks involving data manipulation as well as real physical incidents creating congestion. For sensor anomaly and attack detection, we introduced additive and deductive anomalies in the real-time traffic data and showed the ability of the trained traffic predictors to identify the attacks using statistical control charts. We also analyzed the precision and recall of this anomaly detection scheme.
Next, the cascading effect of congestion in a traffic network is analyzed where congestions/perturbations created at a local level at a targeted road segment can propagate backwards like a wave to affect a larger part of the network leading to chained congestions. To analize such effects in a large-scale traffic network, we use the SUMO (Simulation of Urban MObility) ["SUMO" (2019)] traffic simulator to access real-time traffic simulation and monitor as well as analyze traffic patterns under the influence of congestion. We trained traffic predictors with data collected from SUMO under normal operating conditions and showed that the pre-trained models effectively predicted the real-time cascading effect of congestions spreading out to the neighboring road segments. Once a persisting congestion is noted in a road segment, we identified the root-cause of the cascaded congestion by finding the target road where the congestion started using Timed Failure Propagation Graphs (TFPG) [Abdelwahed, Karsai, Mahadevan, and Ofsthun (2009)].
Contributions Our contributions in this paper are: • Building efficient LSTM based traffic predictors in an unique way of modelling each road segment in a large scale traffic network as a function of its neighboring roads and comparing its performance with that of Recurrent Neural Network and Gaussian Process Regression. We achieved an accurate prediction model with an average loss of 6.55 × 10 4 on normalized speed values.
• These traffic predictors combined with statistical control chart CUSUM are able to detect anomalies in sensor reading with high precision and recall indicating an AUC of 0.8507 of the precision-recall curve.
• We formulated the traffic congestion propagation as a Timed Failure Propagation Graph to identify the root cause of failure in the network.
Outline The rest of the content is organized as follows. Section 2 sets up the research problem that we solve. We provide an outline of the research approach and compare it to related works in Section 3. Next, we describe our main contributions. Section 4 presents the models that we use for traffic speed prediction. We discuss our approach to anomaly detection and its comparison with the classical Gaussian Process Regression based anomaly detection in Section 5. Then, we discuss the cascade analysis approach and root cause isolation in Section 6. Section 7 concludes the paper and discusses future research directions.

PROBLEM STATEMENT
We are interested in developing data-driven detectors to identify the following disruptive events: (a) sensor attacks, that is, cyber-attacks against smart sensors by a networked adversary which may change the measurements values arbitrarily, and (b) physical incidents, such as motor vehicle accidents, that occur randomly and may cause a cascade of traffic disruptions throughout the road network by creating chained traffic congestion. In such cases, identification of the root cause of an event can help eliminate the cascaded propagation of congestion. To help setup our problem, we first present a number of definitions, which include the transportation network as a graph and certain operators on the graph that we use later in the paper.

Definitions
Definition 1 (Transportation Network Graph) A graph representing our system model is defined as G = (V, E) where V is a set of nodes. E is the set of road segments connecting the nodes. In the graph, let v i ∈ V denote a node and e ij = (v i , v j ) ∈ E represent an edge.
Definition 2 (in, out) The in operator in : V → 2 E gives all the edges for which this node v is the destination. When the out operator is applied to a node out : V → 2 E it gives all the edge for which this node v is the source.
Definition 3 (in degree, out degree) The in degree of a node v is the number of road segments incoming to the node and can be calculated as |in(v)|, whereas, the out degree of a node v is the number of road segments outgoing from the node and is calculated as |out(v)|. Polson and Sokolov (2017) developed a deep learning architecture which combined a linear model that was fitted using l 1 regularization and a sequence of tanh layers. The first layer identified spatiotemporal relations among predictors and other segments modelled nonlinear relationships.
The study provided a twofold analysis of short-term traffic forecasts from deep learning. It demonstrated that deep learning provides a significant advancement over linear models. A good review of Deep Learning technologies used in forecasting analysis can be found in Sengupta, Basak, Saikia, et al. (2019). Prior work on traffic forecasting has also been carried out with multi agent based approaches. Hu, Gao, Yao, and Xie (2014) used Particle Swarm Optimization for traffic flow prediction. Some recent swarm-based algorithms listed in , Sengupta, Basak, and Peters (2018)] can also be used in this purpose.
For short term traffic volume forecasting, Zhao, Chen, Wu, Chen, and Liu (2017) proposed a cascaded LSTM network by combining the interaction among the road network in both the time and spatial domain. They showed that the proposed LSTM network approach for traffic volume prediction is sturdy and had a minimum MRE (Mean Relative Error) compared to other models such as ARIMA (Autoregressive Integrated Moving Average) model, SVM (Support Vector Machine) and SAE (Stacked Auto-encoder). LSTM and RNN architectures also outperformed other techniques in numerous applications, such as language learning [Gers and Schmidhuber (2001)], connected handwriting recognition [Graves and Schmidhuber (2009)], Remaining Useful Life Prediction of hard disks [Basak, Sengupta, and Dubey (2018)].
In comparison our approach we model each road segment in the network as a function of its neighboring roads and use that relationship for prediction. When we compared our performance with that of RNN and Gaussian Process Regression we saw that we achieved a better prediction model with an average loss of 6.55 × 10 −4 calculated on Normalized Speed.

Existing Work on Traffic Anomaly Prediction
Zygouras et al. (2015) presented an approach to identify anomalous sensors and resolve whether irregular measurements are due to faulty sensors or unusual traffic. The proposed method was implemented by using the Lambda Architecture which combined a batch processing framework (i.e. Hadoop3) and a distributed stream processing system (i.e. Storm4) for efficiently processing both historical and real-time data. The authors also developed a Crowdsourcing system to extract answers from the human crowd based on the MapReduce paradigm. The study recognised anomalous SCATS (Sydney Coordinated AdaptiveTraffic System) sen-sors from Dublin city with three methods; Pearsons correlation, cross-correlation and multivariate ARIMA model. The three different outlier detection techniques identified a complementary set of faulty sensors. The study gave a detailed experimental evaluation to prove that their proposed approach effectively resolved the source of irregular measurements in real-time.
Lu, Varaiya, Horowitz, and Palen (2008) provided a systematic study of previous loop fault detection and data correction methods, and also systematic classification of possible faults and the reasons behind them at different levels. According to the study, existing work on loop fault detection and data correction/imputation may be divided into three levels which lead to different viewpoints for loop fault detection and data correction: macroscopic such as: (a) TMC/PeMS level; (b) mesoscopic a stretch of freeway; and (c) microscopic control cabinet level. These three levels of approaches are complementary to each other although they study the problem from different aspects using a different level of data.
In this work we used statistical control chart CUSUM to identify malicious sensor attacks with high precision and recall indicating an AUC of 0.8507 of the precision-recall curve.

Existing Work on Cascading Failures
Daqing, Yinan, Rui, and Havlin (2014) (2017) proposed a data-driven approach CasInf to study the cascading patterns of traffic propagation through maximizing the likelihood function from the available data. They treated it as a submodular function maximization problem providing nearoptimal performance guarantees.
In this paper, other than analyzing the cascading effect of traffic congestion on the neighboring road segments of the network, we show that the source of congestion can be isolated by formulating the congestion propagation problem as a Timed Failure Propagation Graph.

TRAFFIC SPEED PREDICTION MODEL
For the Nashville dataset, we have 3,724 unique TMCs. For each TMC we have collected speed values for a total of 6000 timesteps. Each timestep specifies a small time interval of 10 minutes.
First, a matrix of dimension (total number of timesteps × traf-fic speed for all unique TMC IDs) (6000 × 3724) is formed. Some of the TMCs do not have speed value recorded. To interpolate the missing speed value of a particular TMC, we are considering the speed values of all the neighbouring TMCs for the preceding and succeeding timestep using data imputation.
Since we consider the speed of the neighbors for predicting the speed of a TMC we must ensure that we normalize the speeds (see definition 11). The normalized speeds are defined to be in between 0 and 1 and help ensure data ranges are balanced between the road segments. This is required for building a good predictive model.
Definition 11 (Normalized speed) The normalized speed of a TMC (definition 4) is a ratio of its current speed with the average of speeds for times when the jam factor (definition 5) is zero.
For each TMC, N 1 in (T M C) and N 1 out (T M C) give the set of its immediate incoming and outgoing neighbors respectively. For each TMC, the normalized speed values for each of its neighbors (including incoming and outgoing) are treated as input features whereas the normalized speed of the target TMC is treated as the label. We applied both Recurrent Neural Networks and Long Short Term Memory Networks to build the traffic predictors for each TMC in the traffic network.
The number of timesteps to look back in order to predict the result for current timestep has been chosen in a way that produces the least loss. The timesteps are varied from 5 to 20. From the experimental results, we have seen that for RNN, ten timesteps provide a stable outcome whereas LSTM gives better result with 15 timesteps. Table 1 shows the average loss on test data calculated over normalized speeds for different timesteps produced by RNN and LSTM. RNN and LSTM take the input as a three dimensional matrix of dimension (Samples × timesteps × f eatures) where number of features is equal to the total number of Neighbouring TMCs. As the sample labels for a particular TMC is the normalized traffic speed value of that TMC, the network learns to predict the speed at any timestep for the target TMC given past 10 timesteps of data inputs form its neighbors. The sample matrices are split randomly into Training Set and Test Set (70% Training and 30% Testing).

Prediction Using Recurrent Neural Network
For Recurrent Neural Network (RNN) prediction model, we have tried a different number of neurons (from 40 to 200) in the input and hidden layers. We ran the models with a different number of neurons for the first 100 TMC. From the average losses, we have found out that RNN works better with 80 neurons. Figure 1 shows the average losses produced by RNN and LSTM for the different number of neurons. The average losses provided by RNN show a downward trend for 40 to 80 neurons. Afterwards, as the number of neurons increases, the average loss also increases.  Figure 2 shows the predicted speed value and actual speed value of the first TMC for the first 400 timesteps. The loss of this prediction is 3.388 × 10 −5 .

Prediction Using Long Short-Term Memory
For the LSTM model, we have predicted normalized speed values for the different number of neurons (40 to 200). The average losses show a downward trend with the increasing number of neurons. According to our experiment, 180 neurons in both input and hidden layer produces the least average loss. The loss function is defined in terms of mean squared error. Figure 1 shows the average loss produced by RNN and LSTM with varying number of neurons. It is visible from the figure that RNN converges with 80 neurons while LSTM needs 180 neurons. So, in our LSTM model, we have used 180 neurons.

Comparison Between RNN and LSTM
To compare which model is producing a better result, we have run the model with their optimal number of neurons and timesteps. Based on our experiments, the optimal number of neurons for RNN and LSTM is 80 and 180 respectively. RNN works the best with 10 timesteps and LSTM with 15 timesteps. So, we ran both the models for the first 100 TMC to see which delivers the best result. Figure 4 shows the losses for the first 100 TMCs. It is visible in the figure that LSTM produces the least loss in most cases. The average loss from RNN is 7.04 × 10 −4 , and average loss from LSTM is 6.55 × 10 −4 . So, LSTM works best for this dataset.

Prediction Using Gaussian Process Regression
Other than neural networks, we have also used Gaussian Process Regression [Rasmussen and Williams (2005)] which is a Bayesian approach for modelling functional relationships to build traffic predictors. The underlying assumption in this process is that the prior distribution of the regression function is considered to be a multivariate Gaussian distribution. By calculating the covariance matrix for the labeled data and covariance vector between labeled and new test data points and taking the measurement noise into account, the prediction result for the test data points can be obtained [Ghafouri, Laszka, Dubey, and Koutsoukos (2017)]. In this work, we have used Radial Basis Function (RBF) as the kernel. Figure 5 compares the root mean square losses of the prediction results produced by LSTM and Gaussian Process Regression for the first 100 TMCs. The average loss from Gaussian Process Regression is 0.0178 whereas LSTM produces an average loss of 6.55 × 10 −4 showing that LSTM works best for this traffic speed prediction problem.

DETECTION OF ANOMALIES
The goal here is to identify anomalous sensor readings in the traffic networks. Anomalous sensor readings can arise due to sensor attacks as faults can be artificially injected in the data stream associated with a sensor by a networked adversary. So it is important to build effective anomaly detectors so that we can mitigate the effects by replacing the erroneous or missing data with predictions based on correct values from other sensors through data imputation.
Each TMC ID is associated with a sensor s i whose value is predicted through the set of sensors (s j ∈ S, i = j) placed in the neighboring (incoming and outgoing) road segments. The anomalous sensor readings can be detected by calculating the difference between the prediction and the real-time sensor measurement. The time series data representing this difference can be used for identifying anomaly. The anomalies in the sensor data can be introduced in two ways: additive or deductive. In case of additive anomalies, the sensor readings are increased arbitrarily compared to normal operating conditions. Conversely, for deductive anomalies, the sensor readings are decreased compared to the normal conditions. We must inject anomalies artificially into the real data since we need ground-truth labels for anomalies in order to validate the detection approach, but we do not have any labels corresponding to anomalous readings of real data. Figure 6 shows and example of differences between the predicted and the actual real-time sensor measurements during an additive sensor attack. Figure 6. Introducing additive anomaly into sensor readings of a TMC In this work, we use Cumulative Sum Control chart (CUSUM) [Page (1954)], which is a statistical control chart to track the variation of timeseries data. This algorithm is used to identify the timestamp when the anomaly started and ended, the amplitude of change, and an alarm (timestamp of when the anomaly was detected).
By choosing a threshold, we can control the number of false positives and negatives, i.e., we can modulate the sensitivity of the algorithm for anomaly detection. The upper (usum t s ) and lower (lsum t s ) cumulative sums are defined as: The CUSUM criterion detects a sample x t s of sensor s to be anomalous at timestamp t, if (usum t s > η s ) or (lsum t s < η s ), where η s is the detection threshold for sensor s. Figure 7. Detection of anomaly through CUSUM algorithm Figure 7 shows the detection of anomaly for the case described in Figure 6. We introduced an additive sensor attack between the time window of (80,100) and the difference between the predicted speed through LSTM and the sensor data subjected to the attack has been fed to CUSUM, which triggered the alarm at 80th and 100th instant, identifying the actual time of attack. This anomaly identification can be carried out online as we continuously feed the difference between the prediction and sensor measurements. This validates the fact that using traffic predictors combined with change detection algorithm CUSUM, online identification of anomalies is possible.
To compare the efficiency of the anomaly detection scheme between the approach combining LSTM based traffic predictors and CUSUM and on the other hand, Gaussian Process Regression based traffic predictors and CUSUM, we show the Precision-Recall curve for both the approaches by varying the anomaly detection thresholds similarly. Series of randomly generated additive and deductive anomalies have been introduced in the sensor data and the above mentioned approaches have been applied on the same altered data to identify the anomalies. Figure 8 shows the Precision Recall curves of LSTM and Gaussian Process Regression showing their comparative efficiency in identifying anomalies. The Area Under Curve (AUC) for Gaussian Process Regression is 0.4070 whereas the AUC for LSTM based approach is 0.8507 showing its superiority in identifying anomalies all other conditions remaining equal. We expected LSTM to perform better in anomaly detection because we had already seen in Figure  5 that it predicts traffic speed more accurately It is to be noted that anomalies in sensor data can also be due to physical incidents. However, the presence of any physical incidents can be deduced by Timed failure Propagation Graphs indicating a sequence of anomalies. This is described in detail in Section 6.

CASCADING EFFECT OF TRAFFIC CONGESTION
In a large-scale interconnected system such as a traffic network, congestion in one (or some) parts can lead to congestion in other, connected parts as well. In this paper, our goal is to identify the pattern of how congestion originating from one road segment propagates backwards to the incoming branches of the road segment, creating a cascading effect of traffic congestion.
To study the spread of road congestion, we used SUMO, which is a microscopic traffic simulator. SUMO allows us to introduce congestion by manipulating a running simulation and to measure road traffic using simulated traffic sensors. All of the experiments in this section are based on SUMO simulations. We simulated congestion scenarios on a part of Nashville's road network, which we downloaded from Open-StreetMap ["OpenStreetMap" (2019)]. Figure 9 shows the part of the road network that we used in our simulations. For our experiments, we introduced congestion at road segment R1. Figure 10 depicts an instance of congestion simulation, where the vehicles at the target road R1 completely stop due to some incident. The graph shows how the effect of the congestion propagates backwards to affect all the incoming road segments of R1. Following the congestion at R1, the observed speed at its first hop neighbors R2 and R3 drops immediately; whereas the speed at its second hop neighbors R4 and R5 drops one minute later. Vehicle speed at the third hop neighbor R7 drops following the speed drop at R5.

Congestion Simulation
We trained traffic speed predictors for each road segment using the data collected from SUMO. For training the predictors, we modeled the speed of each road segment as a function of its neighboring road segments, all working under normal Figure 9. Part of the Nashville traffic network showing the source of congestion and the direction of traffic flow operating conditions, so that each predictor learns how speed at the target road segment depends on its neighbors. Then, we tested whether they can predict the speed at a road segment based on the speed at its neighboring road segments under the influence of congestion. Figure 10. Congestion instance: vehicles at target road R1 completely stop due to some incident.

Effect of Physical Incidents
In section 5 we discussed anomaly detection when anomaly was introduced at a particular road segment whereas the neighboring road segments were working under normal operating conditions. So, the traffic predictors predicted the speed of the target road based on the speed of the normally operating neighbors. As a result, the prediction result for the target road produced normal speed values as the output which deviated from the anomalous sensor readings showing large difference in the actual and predicted speed.
In case of a physical congestion in a road segment the traffic speed of the target road segment experiences a sudden de- crease in speed while its neighbors are still operating under congestion free condition. So the prediction of speed for that road segment is off by some margin from the actual speed at that current time as the prediction is based on speed of the neighbors who are still working normally. Under this condition our LSTM based traffic predictors should raise an alarm due to the large deviation between actual and predicted speed. However, as time progresses and congestion propagates to the neighboring roads, the traffic predictor for the target road starts giving predicted result close to the actual decreased speed as the neighbors are also getting congested. Once the difference between the actual and predicted result goes down the alarm turns off 1 . Figures 11 and 12 show that the time at which the congestion started there is a large difference between actual and predicted speed and then the difference decreases with progression of time. We observe this sequence of alarms (as they turn on) for each road as a time series to hypothesize the source of the physical incident.

Timed Failure Propagation Graph of Traffic Network
We can identify the source of congestion efficiently using a Timed Failure Propagation Graph (TFPG) [Abdelwahed et al. (2009)]. TFPGs capture the causality and temporal pattern of failure propagation in complex systems. A timed failure propagation graph (TFPG) is a labeled directed graph where nodes are either failure modes or discrepancies. Discrepancies are 1 Note that this is because the LSTM is predicting based on recent history the failure effects, some of which may be observable. Edges in TFPG represent the causality of the fault propagation and edge labels capture operating modes in which the failure effect can propagate over the edge, as well as a time-interval by which the failure effect could be delayed (see figure 14). Figure 13 shows a TFPG model capturing the propagation of congestion among the edges of the network described in Figure 9. To create a TFPG model for the traffic network, we start with a directed graph of the traffic network, where each road segment in the network corresponds to a discrepancy node in the TFPG. The direction of the edges between the TFPG nodes is opposite to the direction of traffic flows in the traffic network since congestion propagates in the opposite direction of traffic flow. The TFPG is comprised of a nonempty set of discrepancy nodes (DN). Each edge e TFPG in the TFPG model represents the direction of congestion propagation between two road segments with an approximate minimum e TFPG [tmin] to maximum e TFPG [tmax] time bound. The time for congestion propagation are subject to some fluctuations depending on specific time of days and other external factors. These time bounds are obtained from the simulation, which we set up by creating congestion scenarios in each edge of the network and calculating the time bounds within which the congestion propagates from one DN to other. All the discrepancy nodes in the TFPG are OR type as they are activated when the congestion propagates from any of their parent nodes within the specified time bound. Certain discrepancy nodes are consistently monitored, i.e., we have traffic predictors for this discrepancy nodes; Note that monitoring all the discrepancy nodes in a largescale traffic network is computationally expensive. There are various ways for selecting monitored nodes of a graph under the constraint of maximum number of allowed nodes that can be monitored and can be treated as an optimization problem. Davis, Gera, Lazzaro, Lim, and Rye (2016) discussed hill-climbing algorithm which starts with an initial seed node for placing the first monitor and goes on placing the next monitors on the highest degree neighbors. Wijegunawardana, Ojha, Gera, and Soundarajan (2017) discussed strategies of monitor placement based on graph topology and colors of nodes. Other than some well-known monitor placement strategies such as smart random sampling, red score, most red neighbors, the authors proposed a learning based monitor assignment strategy. As there are numerous well-established methodologies for this problem, we do not discuss it any further.

Diagnosis
In a traffic network, congestion created at a source road segment propagates to its incoming neighbors. So if the root cause of an observed congestion at a certain road segment is to be found, then the root must lie within its k-hop outgoing neighbors in the traffic network. Note that the direction of traffic flow in the network is opposite to the direction of the congestion propagation shown in TFPG. Hence, once an alarm is observed from one of the monitored discrepancy node, a hypothesis is made such that the root failure node must lie within a subset of k-hop incoming discrepancy nodes in the TFPG. So, starting from a monitored alarm at a monitored discrepancy node, traverse through the TFPG, in a backward manner, and check if their corresponding alarms have been activated within the time range specified and go up to k-hop incoming discrepancy nodes, until the alarm at k-th hop discrepancy node is not activated but alarms till (k-1) th hop discrepancy nodes have been activated, so that we know that the source of congestion was at (k-1)th hop discrepancy node. At each hop, the subset of DNs whose alarms are not observed from the set of DNs at that hop are eliminated from the hypothesis set, so that the hypothesis set for finding the root of failure shrinks continuously and ultimately boils down to a single discrepancy node which is the source of congestion.

Case Study
Here we present a case study, where we try to find the source of congestion for road segment R4 (see Figure 9) where an alarm has been observed in the corresponding DN after 5 minutes from start of simulation. For the root cause diagnosis we first check for its first hop incoming neighbors R3 and R13, out of which the alarm of R3 has been activated almost 60 seconds ago and the time bound for congestion propagation from R3 to R4 is (20-60) seconds as shown in the TFPG model in Figure 13. However DN R13 is inactive following by the same logic. Next we check when the alarms of the immediate incoming neighbors of only R3 triggered, and find the alarm of R1 to be activated within specified time bound.
Then we stop checking further as the alarms associated with none of the immediate incoming neighbors of R1 is activated, returning R1 as the source of congestion correctly.

CONCLUSIONS AND FUTURE WORK
We proposed a traffic prediction model considering a largescale traffic network as a connected directed graph and compared two machine learning approaches, of which LSTM performed the best with an average loss of 6.55 × 10 −4 on Nashville traffic data. We employed CUSUM along with the trained traffic predictor models to identify malicious sensor attacks, which achieved a precision-recall curve with AUC 0.8507, demonstrating the effectiveness of the approach in anomaly detection. Next, we analyzed cascading effect of traffic congestion using a traffic simulator and predicted its impact on the traffic speeds in the neighboring region of the source of congestion. The most interesting contribution of this paper lies in formulating the cascading effect of congestion propagation problem as a Timed Failure Propagation Graph. We identified the source of congestion traversing through the TFPG on observation of congestion at any edge of the traffic network. In future work, we will analyze cascading failures in other large-scale coupled systems, such as electrical grids and water networks, and identify the sources of failures using approaches that are similar to the ones introduced in this paper.