A Bayesian-entropy Network for Information Fusion and Reliability Assessment of National Airspace Systems

The air traffic control (ATC) system is critical in maintaining the safety and integrity of the National Airspace System (NAS). This requires the information fusion from various sources. This paper introduces a hybrid network model called the Bayesian-Entropy Network (BEN) that can handle various types of information. The BEN method is a combination of the Bayesian method and the Maximum Entropy method. The Maximum Entropy method introduces constraints and is given as an exponential term added to the classical Bayes’ theorem. The exponential term can be used to encode extra information in the form of constraints. The extra information can come from human experience, historical data etc. These knowledges, once written in a mathematical format, can be incorporated into the classical Bayesian framework. The BEN method provides an alternative way to consider common data types (e.g., point observation) and uncommon data types (e.g., linguistic description for human factors) in the NAS. The reported work is demonstrated in two example problems. The first example involves an air traffic control network model and the BEN uses information from various sources to update for the risk event probability. The second example is related to the prediction of the cause of runway incursion. A network model studying different sources of error is used to make predictions of the cause of runway incursion. The training and validation data is extracted from existing accident report in the Aviation Safety Reporting System (ASRS) database. The results are compared with that of the traditional Bayesian method. It is found that the BEN can make use of the available information to modify the distribution function of the parameter of concern.


INTRODUCTION
The worldwide air traffic has seen a continuous increase in the past decades (Strohmeier, Schäfer, Lenders, & Martinovic, 2014).This puts a heavy burden on the air traffic management (ATM) for maintaining the safety of NAS.While a large portion of the air traffic accident is due to human error, human performance has been always considered as a critical influencing factor for ATM (Rodgers, 2017).The Federal Aviation Administration (FAA) and other organizations have been heavily investigating in this research area.Since humans are irreplaceable in the air traffic control (ATC) system due to their ability of quick reactions to unusual scenarios (Rognin, Grimaud, Hoffman, & Zeghal, 2001).The NextGen is looking for a computer assistant working along with ATC controllers to monitor and maintain the safety and predict accidents (Martin et al., 2016).The information fusion is critical in achieving this goal.
Bayesian updating is one of the most popular method in handling information.It has been extensively applied in engineering problems for uncertainty and reliability analysis (Peng et al., 2013) (Peng et al., 2012).It is a robust and welldeveloped tool to handle point data based on the Bayes' theorem.The Bayes' theorem is used to describe the probability of an event based on some existing knowledge of related events.The posterior probability is expressed as a proportion of the product of the prior probability and the likelihood function (Bayes & Price, 1763).Bayesian network is a probabilistic graphic model based on the Bayes' theorem.It interconnects the variables that potentially has dependencies through a direct acyclic graph (DAG).A DAG is a graphic with variables as nodes and connected edges represents the dependency between the variables.It is often used in modeling the causal relationship between variables as it can infer for the probabilities in the network according to some existing knowledge.
studies in introducing extras into the Bayesian framework.One research focuses on updating with fuzzy range data (Sankararaman & Mahadevan, 2011).It uses an integral over the range data as the likelihood function to represent the possibility of the range information.(Graca, Ganchev, & Taskar, 2008) minimizes Kullback-Leibler (KL) divergence between the prior and posterior under constraint.The KL divergence is solved using expectation maximization (EM) algorithm for the posterior.A regularized Bayesian method was presented in (Zhu, Chen, & Xing, 2012).It puts constraints on the model posterior given data-dependent information.The method was derived from a posterior regularization framework (Ganchev & Gillenwater, 2010) which is used for structured, weakly supervised learning to express constraints on the latent variables.In this paper, we introduce the Bayesian Entropy network (BEN) model to encode extra information that can be derived from various sources.The method uses the updating rule from the maximum entropy (ME) method and combines it with the classical Bayesian network.
The ME method was firstly introduced in (Jaynes, 1957a) (Jaynes, 1957b) as an alternative updating method for calculating posterior given new evidence.It was found that the Bayes' rule is merely a special case of the ME method (Caticha & Giffin, 2006).Comparing with the classical Bayesian theorem, the target posterior in ME method has an additional exponential term, which contains the constraint information.The constraint could be a statistical moment information (Giffin & Caticha, 2007) and range information.The method has been successfully applied in (Guan, Jha, & Liu, 2012) for single parameter updating in fatigue problem and in (Wang & Liu, 2018) for classification task.
In this paper, we explore the application of the BEN method in modeling ATC systems and add in human perception information in the form of constraints.Comparing with the current risk prediction approaches which mostly rely on human judgements (Roychoudhury et al., 2016), the BEN method can introduce an automated, robust and easy way of handling information to predict accidents.The rest of the paper will be organized as follows: In the next section, the ME method will be reviewed.The derivation for the ME method given moment and range constraints will be introduced.In section 3, two demonstration examples will be discussed.One showed an imaginary air traffic control model for predicting risk.The model used information from various source to infer for the probability of risk.Another example used the features extracted from ASRS reports involving runway incursion accident to build a network model.The goal is to use this feature information to classify the cause for runway incursion accident.Comparing with the current method used in NAS system for predicting accidents, the BEN can achieve an automated process and can incorporate human knowledge at the same time.The last section will provide conclusions and future work.

THE BEN FRAMEWORK
This section will discuss the formulation of the BEN framework.First, the ME method will be briefly reviewed.Followed by the derivation given moment and range constraint.

Brief review of the maximum entropy method
The ME method was originally developed in (Jaynes, 1957a) to calculated probability with information as constraint.Later, it is found in (Caticha & Giffin, 2006) that the ME method can be used as the Bayes' rule.It can use both data observation and moment information to update for a posterior probability (Giffin & Caticha, 2007).The entropy is defined as the negative of the KL divergence between the target probability P(θ) and the prior Q(θ) as: () [ , ] ( ) log () The integral is evaluated over the entire domain of the parameter θ.The idea of the ME method is to maximize the entropy term under constraints.The form of the constraint may vary depending on the information given.For example, in a scenario where the parameter θ is being updated according to an indirect observable x, the observation x' can be expressed as a constraint: where p(x, θ) is the joint probability distribution function for x and θ and p(x) is the marginal of x. δ is the delta function.
The entropy, in this case, is between the new joint probability function p(x, θ) and the old one q(x, θ): In addition to the constraint in Eq. ( 1), the integral of the probability function over the domain is unity.This gives a normalization constraint: ( , ) To maximize the entropy under these two constraints, we can use the Lagrange method: where α and λ(x) are Lagrangian multipliers.The variation of the Lagrangian is zero ( 0   L ) would give the optimal solution to p(x, θ).This yields to a solution: In Eq. ( 6), 1 e   can be regarded as a normalizing constant.
By substituting Eq. ( 6) back into the constraints in Eq. ( 2) and Eq. ( 4), we can solve for the analytical form of the new joint probability function: where () X qx is the marginal distribution for x.Hence, the posterior of θ can be calculated by integrating the joint over the domain of x.
Equation.8 is exactly the Bayes' rule!

ME method with moment and range constraint
In addition to dealing with point data, the ME method can introduce extra information on top of the classical Bayesian method.One type of information that is commonly available is the statistic moment information.This type of information may come from the expert opinion or empirical data.The traditional Bayesian method cannot handle these data easily, but the ME method can incorporate them using a constraint.Following the above derivation, we now have: ( , ) ( ) Equation ( 9) expresses the expected value for function g(θ) is equal to G. () n g   represents the nth order moment constraint.With the normalization function in Eq. ( 4), we can form a Lagrange function with Lagrangian multipliers α, β: Using the similar approach, the new joint probability function is given as: β can be analytically solved by substituting Eq, ( 11) into the constraint in Eq. ( 9) given specific form of the distribution function and the constraint type.
When both moment constraint and observation data were available, the Lagrange function includes the constraint in Eq. ( 2), ( 4) and ( 9).The solution for the posterior for parameter θ can be expressed as: From Eq. ( 12) we can clearly see that the result from ME method has an additional exponential term added to the Bayes' rule.The exponential term includes information introduced by Eq. ( 9).When such information is not presented, 0   and Eq. ( 12) returns the Bayes' rule.
The statistical moment information is one way to present existing knowledge about a parameter.Sometimes there could be range information on the parameter of concern.Such as the definition or the design limit of a parameter should fall in a certain range.Some research work dealt with this type of information by assuming bounded priors such as Beta distribution.But the assumption would lose the generality for the shape of the distribution.In the ME framework, we can encode such information in the form of a constraint.Suppose the parameter θ should fall in the range from a to b.This piece of information can be written as: Along with the normalizing constraint in Eq. ( 4), it restrained the probability function on the region between a to b for θ.
Again, a Lagrange function can be written with the constraints in Eq. ( 4) and ( 13) with Lagrangian multipliers α and γ: The variation of the Lagrange function equals 0 yields a piecewise solution to the new joint probability function: Substituting Eq. ( 15) back into the constraints in Eq. ( 4) and ( 13), the final solution is given as a truncated distribution over the domain of θ: where Qθ(•) is the cumulative density function (CDF) of the prior.The key point here is that we did not make any assumption about the prior, the ME method is modifying the distribution function according to the given constraint information.

The Bayesian-Entropy network (BEN) framework
Based on the above derivation, it can be clearly seen that the ME method can provide an alternative way to update for posterior probability with a richer variety of information.The BEN framework uses the same topology as a Bayesian network and, instead of the classical Bayesian updating rule, uses the ME algorithm to calculate the posterior.The main idea of the BEN framework is to encode extra information based on the Bayesian method.It can be think of as two parts, which is used to handle point observation data, and an extra exponential term that can impose constraint over the Bayesian posterior.The constraint information can come from human perception or empirical knowledge, such as the mean value of a variable (moment information), the correlation between two variables (likelihood information) or the defined range of a parameter (range information).Since the method only adds an exponential term to the Bayesian theorem, it can be incorporated into any existing Bayesian applications such as classification, updating and inference and does not add computational cost.

APPLICATION OF BEN IN AIR TRAFFIC CONTROL
This section gives two examples of the application of the BEN method in air traffic risk assessment.The first example demonstrates the ability of BEN to infer the risk probability using various sources of information.It is only a demonstration example and does not represent any research work or real-life scenario.The second example involves a Bayesian network model that examines the cause of runway incursion accident.The network was built with feature data derived from 37 ASRS report.Details will be discussed below.

BEN in ATM risk control
In this example, we investigate in the application of the proposed method in air traffic risk assessment.Figure 1 showed a network model built to evaluate the risk of an aircraft.The risk is related to two factors: the speed of the aircraft and the pilot performance.The speed of the aircraft can be affected by weather (e.g.wind speed, rain) and visibility.The weather and visibility are interconnected indicating the potential correlation between the two variables.The pilot performance is a measurement of the pilot status.
The experience of the pilot and the rest (sleeping hours) of the pilot prior to the flight are two influencing factors that contribute to the pilot performance.The distribution for each variable in the network model is listed in Table 1.Risk and weather are considered as discrete node and the others are modeled as continuous.Risk is a binary node with 0 and 1 correspond to safe and accident, respectively.Weather can take four discrete values, each representing four possible weather conditions, such as sunny, cloudy, rain and snow.The continuous nodes are all modeled as Gaussian nodes.The Pilot can be a reference value for the evaluation of the pilot performance.Experience could be the years of experience of a pilot driving the aircraft.And rest is the sleeping hours of the pilot prior to the departure.
We assumed three scenarios to update for the risk probability: 1.An observation of Rest=6 is made about the pilot.This scenario uses only the Bayesian updating.2. In addition to the observation of Rest=6, a first order moment (mean) of rest=8 is given.This scenario will use the BEN to incorporate this moment information. .BEN will use this information to change the likelihood between the two variables.The three pieces of information will be fused into the network model via the BEN method and update for the risk probability (Figure 2).The updated result for the marginal distribution of Rest can be seen in Figure 3.The solution given this constraint is given as:

This scenario includes the observation of
where q is the old likelihood function and μ and σ are the distribution parameters (mean and variance) for the old likelihood function.Equation.( 18) is used for updating in the third scenario.The results for the updated marginal distribution for Pilot and Risk can be seen in Figure 4. To interpret the result, we can think of the observation information comes from a recording device that tracks the pilot's sleeping time.The moment constraint can be understood as the tracking device may be malfunctioned and we tend to believe that the pilot has followed his regular schedule for 8 hours of sleep.The information in the third scenario can be a new research finding of the correlation between pilot performance and pilot sleeping hours.From the result, we can see that: for the first scenario, the risk probability increased due to the observed low resting hours.For the second scenario the risk probability decreased since we tend to believe that the pilot had enough rest.The risk probability has a sudden increase in the third scenario because the constraint introduced a positive correlation.This acted as a penalty for the observed low sleeping time.Hence the risk probability increased.This example illustrated the ability of BEN to incorporate various sources of information into a network model to update for risk probability.According to the result, BEN can take advantage of the extra information to modify the probability distribution.

BEN for runway incursion
Runway incursion is defined as the incorrect of the presence of aircraft in landing and take-off area (Wilke, Majumdar, & Ochieng, 2015).It can cause critical accidents and property damage.This example explores a Bayesian network model to classify the cause for runway incursion during take-off.The network topology is built using features extracted from ASRS runway incursion accident report.The features are extracted by manually read the accident report.A total of 331 report involving runway incursion between 2014 and 2017 have been found.Due to the heavy workload of analyzing the report only a small number of reports were studied and 37 out of these reports were used in this example.It is found that, in the 37 reports, the runway incursion is caused by communication error between the pilot and ATC tower.Three types of a runway incursion is identified: 1. Runway crossing without clearance, 2. Taxi across hold-line without clearance, 3. Attempt take-off by ignoring Line Up and Wait (LUAW).Four types of communication error can be found in the 37 reported cases, which are: 1. ATC operator issues ambiguous taxi clearance (taxi clearance communication error on ATC side) 2. Pilot miss readback on taxi clearance (taxi clearance communication error on the pilot side) 3. Pilot miss readback on runway crossing clearance (runway crossing communication error) 4. Pilot miss readback on LUAW clearance (LUAW communication error) In addition to the communication error, some attributes in these 37 reports were extracted as basic features, they are: number of runways in the airport, the runway layout of the airport (whether there is intersection or not), number of people on the same radio frequency and the time of the day at the accident.Based on these available features, a network model for runway incursion classification is built in Figure 5.The network assumes that the four basic features are independent from the occurrence of runway incursions but can be a contributing factor of the communication error.The four basic features are all assumed to be independent with each other.

Figure 5. Bayesian network for runway incursion
A random train-test split is done to the 37 data instances.The training set is used to calculate the conditional probability table for the network and the test set is used to validate and test the classifier.The test was done using only the four basic features to infer for the communication error type and runway incursion.Due to the limited data, a Bayesian network cannot achieve plausible accuracy.
When reading the accident report, it is found that there are certain patterns for the correlation between variables, for example, when the number of people on the radio frequency is less, taxi clearance communication error is more likely to happen on the pilot's side.Such information may come from an experienced operator, or a report reviewer who has read a lot of the accident report and was able to generate this type of empirical knowledge.These knowledges can be encoded into the network using BEN method as the entropy information.
The entropy information included in the BEN model are: 1.At night, a runway crossing communication error is more likely to happen.2. When the number of people on the radio frequency is less, a taxi clearance communication error is more likely to happen on the pilot's side.3. When the taxi clearance communication error is on ATC side, the cause for runway incursion is more likely to be cross runway without clearance.4. When the taxi clearance communication error is on pilot side, the cause for runway incursion is more likely to be taxi across runway hold line. 5. LUAW communication error can only lead to and is the only reason for attempt take-off without clearance.Since the communication error and runway incursion are all categorical nodes, integer values such as 1, 2, 3, 4 are assigned accordingly.The above constraints are all considered as mean constraints (1, 2, 3, 4) or range constraints (5).The training was done in a similar manner as the Bayesian approach.With the encoded constraints, the testing accuracy is plotted comparing with the accuracy from the Bayesian method in Figure 6.Although the accuracy is still not satisfying, due to the encoded constraints, the BEN has around 10% improvement comparing with the classical Bayesian method.Since the lack of data, the result might not be representative.The author will keep working on analyzing reports to extract more data for sufficient training and testing set.The state-of-the-art method for accident prediction in the NAS is mostly replying on human operator, such as Flight Risk Analysis Tool (FRAT) (FAA, 2007) and Safety Management System (SMS) (FAA, 2015).Human are subject to fatigue and performance would vary for different operator.While the application of BEN can achieve an automated prediction scheme that can be robust and reliable.

CONCLUSION
The paper introduced BEN, a hybrid network model for updating probability using data from various source.The method has two parts: a Bayesian part that can be used to handle the point data observations and an Entropy part that encodes extra information using an exponential function.When there is no other information, the exponential term is automatically dropped.According to the demonstrative example, the BEN can take advantage of different types of information that is not easily handled with a Bayesian approach by altering the distribution function of the variable.The second example illustrated that the BEN method can be used to encode human knowledge into a network system.It can be concluded that using the BEN method, we can build a network incorporating human knowledges (moment constraint), empirical information (range constraint) and new correlations (likelihood constraint).This can achieve the information fusion for monitoring the NAS system.The current approaches for identifying potential hazards in the NAS is mostly based on human, such as pilot, ATC controller etc.According to the demonstrated examples, the application of BEN in the NAS can achieve an automated process for accident predictions.
The proposed method provides an easy way of information fusion from various sources.The entropy term can be analytically solved given specific form of the distribution function and the constraint.The BEN method does not add computation complexity compare to the Bayesian approach.Any method Based on Bayesian framework can be easily incorporated with the BEN method.
The BEN is proven to be useful with information other than point data/observations.How the method will behave in more complex and largescale network needs further research.The author will continue to work on the runway incursion problem to generate more data and find more representative features to justify the network model in classifying the cause of runway incursion

Figure 1 .
Figure 1.The topology for the ATC risk model.

Figure 2 .
Figure 2. The topology of the ATC model with the information in three scenarios.

Figure 3 .
Figure 3.The posterior for Rest in the first two scenarios It can be seen that when updating with only point observation (first scenario), the posterior distribution is shifted towards the observed value and variance decreased.While the posterior from BEN has a similar shape but the mean value was shifted to the value specified by the mean constraint.The update will propagate in the network along the edges.In the third scenario, a new correlation between Pilot node and Rest node is introduced as Pilot (Rest) 10 Rest f    .Since it is a constraint imposed on the likelihood function, it is written as: (Pilot | Rest)Rest Pilot 10 Rest pd   The marginal distribution for a) Pilot and b) Risk The average accuracy of classification for a) types of communication error and b) cause for runway incursion