Identification of Diagnostic-related Features Applicable to EEG Signal Analysis

The regulation of functions such as respiratory or heart rate in human body as well as the control of motor movements are under the control of nervous system. As these actions and correlated tasks are directly influenced by the brain, the brain monitoring gives the possibility to differentiate the tasks, enabling at the same time the prediction of further actions. In this contribution, publicly available electroencephalography (EEG) datasets are analyzed with respect to the detection of epileptic seizure occurrence and BCI-related actions (here: cued motor imagery). For these purposes, timefrequency-based feature extraction alongside different classification methods is used. To perform the classification, Artificial Neural Network (ANN) and Support Vector Machine (SVM) are utilized and compared with previously obtained results. The feasibility of particular features for the detection of epileptic seizures and BCI-related tasks is discussed. Four different feature vectors per analyzed problem are identified. Acceptable accuracy of classification using ANNand SVMbased classifiers is achieved using identified feature vectors.


INTRODUCTION
The reactions of humans to the stress, occurred under the influence of a number of external or internal stimuli, is variable.As these reactions have direct impact on the brain and the heart, the brain and the heart monitoring gives the possibility to detect particular states and changes in human body.Accordingly, it can be stated that the signals most extensively used for an inspection of human body are the signals obtained from the brain and the heart: Electroencephalography (EEG), magnetoencephalography, and electrocardiography (ECG) signals.The abnormalities noticed in EEG or ECG signal are the indicators of disorders within the organism.The EEG is used in a number of cases, from the diagnosis of epilepsy, sleep disorders, coma, brain death, to the Nejra Beganovic et al.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
diagnosis of encephalopathies, tumors, or lesions (Diykh, Li, & Wen, 2017;Schirrmeister et al., 2017).These signals are captured using a number of electrodes on the scalp and are correlated to the electrical activity of the brain.Beside most common use and well investigated application of EEG signals for the detection of epilepsy, the analysis of EEG signals is widely utilized in the development of Brain Computer Interface (BCI) systems.The BCI systems are capable to generate control signal, in most cases electric signal to be transmitted to the electric device, in accordance to current brain activity.Control signal in form of predefined command is thus generated based on recognition of a priori known activity patterns in EEG signals.
The analysis of EEG signal is performed in two steps: i) an identification of characteristic feature vector, related to epileptic seizure occurrence or particular BCI-related action, and ii) the correlation of extracted features to particular EEG pattern.The main focus of this contribution is comparison of feature vectors extracted from EEG signal by terms of the detection of epileptic seizure diagnosis and motor imagery BCI task.Continuous monitoring of EEG signal is mostly utilized for epileptic seizure detection, whereas BCI systems design include analysis of specific cases with respect to external or internal stimuli (Ghaemi, Rashedi, Pourrahimi, Kamandar, & Rahdari, 2017;Ramadan & Vasilakos, 2017).
The contribution is organized as follows: i) after introductory part stating the problem of interest, the state-of-art in EEG signal analysis including commonly used feature extraction/selection and classification approaches is discussed in the second section, ii) afterwards, publicly available experimental data sets and used approaches are introduced in the third section, iii) whilst the obtained results are discussed in the fourth section.At last, the contribution closes with the conclusion and outlook.

FEATURE EXTRACTION AND CLASSIFICATION
Concerning time-varying non-stationary nature of EEG signal, a variety of characteristic signal signatures can be calcu-lated and used for classification and pattern recognition purposes.From other point of view, not all calculated signal signatures show the same feasibility for efficient classification and pattern recognition analysis (Sharma, Dhere, Pachori, & Acharya, 2017).In practice, the utilization of feature vectors (group of features) instead of single feature gives more precise results.This implies that the analysis of individual features can be useful for decision making on feature inclusion or exclusion within feature vector.Even more important is to emphasize that the number of EEG channels, corresponding to electrodes localized in parietal, occipital, frontal, central, and temporal region of brain, is not equal to one (Ghaemi et al., 2017); implicitly, the feature itself can either be correlated to individual channel or to many of channels.In such cases, a multivariate analysis has to be done to obtain useful information from a number of EEG channels.
The EEG signal is a signal of low amplitude and low Signalto-Noise Ratio (SNR).Consequently, the amplification as well as the filtering of data is necessary to increase SNR ratio regardless of features intended to be used for classification (Ghaemi et al., 2017).In accordance with the above discussion, the EEG signal primarily has to be amplified and filtered, as depicted in Figure 1.Likewise, the removal of undesirable artifacts is considered within this step.
Alongside EEG amplification, filtering, and artifacts removal, completely different approach is applied in the analysis of Event-Related Potentials (ERPs) (Haider & Fazel-Rezai, 2017).The ERP and its components are a kind of stimulated (evoked) potentials by external or internal stimuli, whereas the EEG measurements correspond to a specific cognitive or motor event (movement of hand, recognition of shapes inside figure, and similar).The differentiation of ERPrelated response from captured EEG signal can hardly be obtained from single measurement (trial).Many trials are thus conducted, analyzed simultaneously, and averaged to obtain ERP-related response to a stimulus (Haider & Fazel-Rezai, 2017).

Identification of EEG features
Component analysis, namely Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), are applied to reduce the amount of data, to decompose a signal into a number of independent signals, or to extract the artifacts from EEG signal.Reduction of high-dimensional to a low-dimensional data is mostly done using PCA and LDA, whereas ICA is principally utilized for artifacts removal through signal decomposition into a number of linearly independent signals (Acharya, Oh, Hagiwara, Tan, & Adeli, 2017).Alongside LDA, PCA, and ICA, Common Spatial Pattern (CSP) technique with a number of variations is extensively applied to EEG signal to separate multivariate signal in a number of linearly indepen-dent subcomponents (Park, Lee, & Lee, 2018;D. Li, Zhang, Khan, & Mi, 2018;Meisheri, Ramrao, & Mitra, 2018).Such transformation of the signal tends to reduce high-dimensional into low-dimensional data by maximization of variance between the data, enabling therefore selection of features which reflects the case of interest.The CSP is proven to be highly efficient for BCI-related tasks.The method is highly sensitive to outliers, setting thereby high requirements to signal preprocessing (Meisheri et al., 2018).
To reveal characteristic frequencies or power spectral density, the techniques such as Fourier Transform (FT) or Discrete/Continuous Wavelet Transform (DWT/CWT) can be applied.As EEG signal has highly dynamic nature, the information about the time is of crucial importance.The DWT and CWT are most commonly used to reveal frequency spectrum of EEG signal without losing an information about the time and to calculate its relevant statistical characteristics (Bhattacharyya, Pachori, Upadhyay, & Acharya, 2017).Additionally, high-and low-frequency components can be analyzed individually by application of Wavelet Packet Decomposition (WPD), decomposing obtained signal into two components (Alickovic, Kevric, & Subasi, 2018).Alongside CWT and DWT, Empirical Mode Decomposition (EMD) as well as Hilbert-Huang Transform (HHT) are applied to nonstationary, non-linear EEG signal (Mutlu, 2018;Krishnan & Samiappan, 2018;Ramakrishnan & Kanagaraj, 2018;Das & Bhuiyan, 2016).Among numerous practical application of EMD on EEG signal, the inceptive application is found in removal of noise as well as some artifacts from EEG signal (Das & Bhuiyan, 2016).Using EMD, the signal is decomposed in a number of Intrinsic Mode Functions (IMFs), whereas each IMF has the same number of zero crossings and envelopes symmetric with respect to zero.Revealed intrinsic modes of oscillations are closely related to instantaneous frequency; precisely, localized frequency within narrow frequency band (Krishnan & Samiappan, 2018).Statistical analysis of EMD related signatures is often applied by means of particular dysfunction diagnose (for instance: epileptic seizures) (Ramakrishnan & Kanagaraj, 2018;Das & Bhuiyan, 2016).Hilbert-Huang Transform is commonly discussed in terms of EMD extension as it uses IMFs to obtain Hilbert spectrum.As such, revealing IMFs is the first step in HHT calculation.Hilbert spectrum is three-dimensional representation of the amplitude, the time, and instantaneous frequency of the signal.According to (Mutlu, 2018;Krishnan & Samiappan, 2018;Ramakrishnan & Kanagaraj, 2018), the Hilbert-Huang spectral analysis and its components such as marginal spectrum, mean marginal spectrum, degree of statistic stationarity, and similar are utilized for the detection of particular dysfunctions and abnormal EEG signatures.
To provide more detailed representation of information conveyed by EEG signal, a number of nonlinear techniques are utilized: Higher Order Spectra, Correlation and Fractal Di- The Auto Regressive, Auto Regressive Moving Average, as well as Moving Average models are commonly used for probabilistic-related analysis of time series data.The prediction of future values in time series data using parametric models is obtained by averaging previous/current values.Accordingly, they can be discussed by means of infinite impulse response filters.Consisting of a number of parameters which have to be estimated, corresponding optimization algorithms/approaches are utilized to minimize output error of the model (Ramakrishnan & Kanagaraj, 2018;Alickovic et al., 2018).

Classification methods and approaches
Not comprehensive but rather a brief review of classification approaches is given in Table 1.As such, the k-Nearest Neighbor classification method, as commonly used classification method, is detailed in (Alickovic et al., 2018).Although the great significance is not given to the selection of classification method in (Alickovic et al., 2018), a comprehensive analysis and comparison of EEG signal decomposition methods and efficient feature extraction for purpose of BCI system development is conducted (Alickovic et al., 2018).Here, EMD, DWT, and WPD alongside PCA are compared to find out the most reliable set of features to be fed to the classifier, with emphasize to the importance of higher frequency ranges for classification purposes.
Recent breakthrough in the field of EEG signal analysis led to an introduction of fuzzy classifiers (Santos et al., 2017) for quadcopter control using electroencephalogram headset.
In (Santos et al., 2017), a hybrid Takagi-Sugeno fuzzy model combined with Bayesian Gaussian model and discriminant analysis is proposed to classify EEG signal.Forecasting the epileptic seizure occurrence using Bayesian Linear Discriminate Analysis (BLDA) and diffusion distance on intracranial EEG is proposed by Yuan et al. (Yuan, Zhou, & Chen, 2018).The features are calculated using Wavelet decomposition on segmented EEG epochs and fed to BLDA classifier.Obtained sensitivity is 85.11% for a seizure occurrence period of 30 min.and 93.62% for a seizure occurrence period of 50 min (Yuan et al., 2018).Multiscale Radial Basis Functions (RBF) and a Modified Particle Swarm Optimization are discussed in (Y.Li et al., 2018) in terms of feature extraction considering an adaptive and localized time-frequency analysis of EEG signal.Obtained features are used as input into Support Vector Machine (SVM) in order to distinguish epileptic seizure from normal EEG signal (Y.Li et al., 2018) proving at the same time high efficiency of aforementioned approach.
According to the results, proposed models give slightly better performance than the conventional CNNs while less number of parameters is considered.Proposed approach in (Acharya et al., 2017) utilizes 13-layer DCNN to distinguish normal, preictal, and seizure classes.Using such approach, the accuracy of 88.67% is achieved showing slightly lower performance than some other commonly used approaches, avoiding at the same time separate steps for feature extraction and feature selection.In (Tang et al., 2017), the authors conducted experiments where imagination of hand movement is inspected.The extraction of features as well as classification for single-trial here is obtained using DCNN-based models and compared with the output of CSP/SVM and AR/SVM classifiers.According to the results, further improvement in classification performance using DCNN is proven.Application of adapted DCNN to detect robot error from EEG signal from human operating in robot-human environment is proposed in (Behncke, Schirrmeister, Burgard, & Ball, 2018).The improved accuracy of robot errors decoding from the EEG of a human observer is proved based on comparison of three different decoding algorithms (ConvNets, rLDA, FB-CSP, and rLDA).Moreover, in (Schirrmeister et al., 2017) deep learning with CNNs is reported as promising tool in EEG-based decoding and EEG-based brain mapping.Liu et al. (Liu, Zhao, Hou, & Liu, 2017) in similar manner introduce a Deep Belief Network for feature extraction of EEG P300 component in "an autobiographical paradigm test", obtaining thus so-called deep characteristic vector from raw feature vector.Afterwards, the SVM-based classification is performed using deep characteristic feature vector with high efficiency (Liu et al., 2017).
Inspired by the success of DCNN in EEG signal analysis, motor imagery movement classification in (Salazar-Varas & Vazquez, 2018) using Spiking Neural Models (SNM) is introduced.Key idea in (Salazar-Varas & Vazquez, 2018) is to avoid long recording sessions for users.As a consequence, higher demand is set to classifiers/classification methods forcing the same to be trained using reduced number of user recordings.A coherence from a subset of three electrodes is calculated and used to show the efficiency of SNM for classification purposes in case of reduced data sets.In comparison with LD, FNN, and RBF, the SNM showed the best performance (Salazar-Varas & Vazquez, 2018).In addition, multilayer feed-forward neural network is applied in (Yang, Lin, & Lu, 2017) to detect movement intention using movementrelated cortical potentials.The features are extracted using "the dictionary learning algorithm" and the performance is compared with the same one of Random Forest and Support Vector Machine (Yang et al., 2017).It is shown that the efficiency of multilayer feed-forward neural network is higher than those shown by RF and SVM.

DATASETS AND SELECTED APPROACH
From brief review of feature extraction and classification approaches given in previous section, it is noticeable that the time-frequency-based features alongside NNs and the variety of its modified implementations are most often utilized in classification/pattern recognition problems of EEG signals.
Here, the detection of epileptic seizures and the recognition of BCI-related tasks are analyzed by terms of identification of common feature vectors.
The datasets used for analysis of epileptic seizure detection are the datasets provided by Temple University (TUH EEG Seizure Corpus), captured using standardized 10-20 electrode configuration in Average Reference referential montage (Golmohammadi et al., 2017).Original data files are split into multiple files corresponding to the data captured in duration of 3 s.Further analysis is done on windowed data originating from a number of sessions related to 20 patients.
The datasets related to the recognition of BCI-related task used here is Dataset IVa from BCI Competition III provided by Fraunhofer FIRST, Intelligent Data Analysis Group."The data set was recorded from five healthy subjects.Subjects sat in a comfortable chair with arms resting on armrests.Visual cues indicated for 3.5 s which of the following 3 motor imageries the subject should perform: (L) left hand, (R) right hand, (F) right foot." (Dornhege, Blankertz, Curio, & Muller, 2004).The EEG signal is captured from 118 channels of extended 10-20 configuration.In this contribution, only subset of 170 labeled data from 3 channels originating from one subject (al) is used.
Data processing steps, depicted in Figure 2, are applied to both aforementioned datasets.For purpose of feature extraction, 16 channels for epileptic seizure detection and 3 channels for BCI-related task recognition are selected.As the events of interest for epilepsy seizure detection are in frequency range <40 Hz (Acharya et al., 2017;Jiao et al., 2018), the high-pass filtering is at first applied to TUH datasets.The events of interest for imagery R/F task are in frequency range <400 Hz, indicating the need for high-pass filtering of frequencies above 400 Hz (Tang et al., 2017).As depicted in Figures 2 and 3, the next step in the processing of the data consists of obtaining CWT coefficients through the application of CWT transform on filtered data.Further, an averaged values of signal energy per channels, the overall signal energy, standard deviation between channels, maximum and minimum values of CWT coefficients can be calculated in order to form feature vector (Tables 2 and 3).
As can be concluded from Table 2, four feature vectors are generated per considered case.The detailed structure with mathematical representation of feature vectors F11, F12, F13, and F14 is given in Table 3.Here, an element X i−n,k−m represents the CWT coefficients between k and m CWT scales, mapped to corresponding frequency range and obtained from channels between i and n.According to the Table 3, feature vector F11 includes six individual feature vectors (K 1 to K 6 ) of length 16 (in total 16x6 features).Individual feature K 1 from this feature set is a vector which represents averaged absolute values of CWT coefficients per each channel.Likewise, individual feature K 9 from feature vector F14 is a vector containing the percentage of overall signal en- ergy contained in the last four scales of CWT per particular channel.Concerning polynomial relationship between CWT scales and frequency bandwidths, the low frequency bandwidth is contained in the last four scales.Contrary, individual feature K 4 of feature vector F12 is a scalar value which corresponds to the minimum of maximum obtained CWT values per channel.By analyzing extracted features, it becomes noticeable that the feature vectors contain wavelet-and statistics-based quantities.Furthermore, feature vectors F21, F22, F23, and F24 have identical structure to feature vectors F11, F12, F13, and F14 but also slightly changed scaling of frequency range due to different frequency bands of interest.It is important to emphasize that only three channels are used from Dataset IVa (BCI Competition III database), whilst sixteen channels are used from datasets originating from TUH.
The CWT coefficients for two randomly selected 3 s intervals from TUH datasets after signal filtering are depicted in Figure 3. Upper figure shows the CWT coefficients of 3 s window consisting the recordings during seizure occurrence.The lower figure depicts CWT coefficients obtained from the recordings whereas no seizure occurred.The individual features of feature set F11 are depicted in Figure 4.The z axis from the plot correspond to the label (seizure/without seizure), whilst the x and y axes correspond to particular features.As such, the relationship between two features from total 16x6 features in F11 are shown.
Extracted features are fed to both SVM-or ANN-based classifiers.The implementation of SVM proposed by Chang et al. (Chang & Lin, 2011) is used.It has to be emphasized that the designed SVM classifier utilizes Gaussian Radial Basis Function and Linear Function as a kernel function.More detailed explanation about SVM-based classifiers can be found  The results of classification concerning different feature datasets and classifiers are compared, aiming to find optimal selection of features and classification approach.

DISCUSSION OF RESULTS
Considered combinations of analyzed feature vectors and classification approaches are given in Table 4.
Noticeably from Table 2, most commonly utilized distribution of data is described as: i) 80% of the data used for training purposes and ii) 20% of the data used for validation/test purposes.Aggravating circumstance in this case is a fact that the amount of data contained within the datasets is not identical as different databases are used.However, the unification of the data from both databases is partially achieved by using 3 s frame windowing and by proper selection of a number of captured recordings.As such, the amount of data used finally for classification purposes becomes equal.
The results of classification are given in the last column of Table 4.At first glance, better performance with respect to classification accuracy regardless of used classification ap-Table 3. Detailed structure of feature vectors.proach (ANN/SVM) is achieved using feature vectors F11 and F14 or, equivalently for imagined R/F movement action classification, F21 and F24.Such results are reasonable and expected as i) feature vectors F12/F22 contain high number of averaged quantities, losing thereby identifiers able to differentiate particular labels, ii) the number of feature elements within feature vectors F11/F12 and F14/F24 is higher than the same one concerning F12/F22 and F13/F23, and iii) the inclusion of standard deviation in F13 feature set contributes to the problem of the data which are not normalized.Conclusion made from previous discussion is: i) the values of feature vector elements has to be normalized in a way that have no high impact on distinguishability and homogeneity of the elements, ii) the decision about either inclusion or exclusion of particular element from the feature has to be made under consideration of the nature of signal, and iii) the averaging of the values which vary in broad range (have large standard deviation) is not recommended as it enables the loss of values that are small in relation to maximum or minimum values.
As the number of features included within F11/F12 and F14/F24 is high and therefore can be considered as disadvantage alongside provided acceptable results of classification, the application of particular approach for data reduction (PCA or similar) is proposed in order to obtain the minimum number of features with slightly worse or even the same results of classification.
Moreover, the results obtained using different classifiers (here: ANN-and SVM-based classifiers) are comparable and highly affected by the design of classifiers.For instance, different kernel function used within SVM or different number of neurons in hidden layer within ANN has an impact to classification results, as pointed out in Table 4.In addition, the performance of classifiers by terms of classification accuracy, obtained under consideration of aforementioned feature vectors, seem to be comparable with the performance of

CONCLUSION AND OUTLOOK
The identification of features/feature vectors applicable to EEG signal analysis with respect to epilepsy seizure detection and BCI-related tasks is performed on publicly available datasets.The CWT is applied to aforementioned datasets to extract a number of time-frequency-based features, whereas the classification results obtained concerning NN-and SVMbased classifiers are compared in terms of classification accuracy and the capability of particular features for classification purposes.
According to obtained results, the most accurate classification is achieved by using feature sets F11 and F14 (or F21 and F24) and ANN classifiers with high number of neurons in hidden layer (ca.97 %).Furthermore, the application of proposed approach to both epilepsy seizure detection and BCIrelated tasks prove epilepsy diagnosis more discriminative in comparison with BCI-related tasks.
Evidently from the given analysis, the trade-off between feature extraction and classifier selection, with respect to time consumption, complexity of the algorithms, and computational power, has to be found.Furthermore, the latest achievements in EEG signal analysis indisputably include artificial intelligence methods, such as NNs and their variations.Indeed, implementation, requirement on computational power, as well as complexity of applied algorithms/methods alongside with increasing number of features become more challenging.In order to cope with all these challenges, the enhancement in time efficiency of algorithm execution is required.The latest endeavors are thus related to real-time execution of algorithms using Field Programmable Gate Arraybased hardware platforms.

Figure 3 .
Figure 3. CWT image plot of TUH exemplary data: With and without epileptic seizure occurrence.

Table 1 .
Brief overview of EEG signal feature selection and classification.

Table 4 .
Proposed selection of feature sets and classifiers.