Feature Mapping Techniques for Improving the Performance of Fault Diagnosis of Synchronous Generator

Support vector machine (SVM) is a popular machine learning algorithm used extensively in machine fault diagnosis. In this paper, linear, radial basis function (RBF), polynomial, and sigmoid kernels are experimented to diagnose inter-turn faults in a 3kVA synchronous generator. From the preliminary results, it is observed that the performance of the baseline systemis not satisfactory since the statistical features are nonlinear and does not match to the kernels used. In this work, the features are linearized to a higher dimensional space to improve the performance of fault diagnosis system for a synchronous generator using feature mapping techniques, sparse coding and locality constrained linear coding (LLC). Experiments and results show that LLC is superior to sparse coding for improving the performance of fault diagnosis of a synchronous generator. For the balanced data set, LLC improves the overall fault identification accuracy of the baseline RBF system by 22.56%, 18.43% and 17.05% for the R, Y and Bphase faults respectively.


INTRODUCTION
Condition based maintenance (CBM) is the most preferred technique in many industrial applications for its reduced maintenance costs and improved safety operations.CBM reduces the downtime and increases the productivity (Jardine, Lin, & Banjevic, 2006).Data acquisition is the primary step in CBM wherein mechanical and electrical signals are collected from the machines to monitor its health.Feature extraction is an important process in CBM which maps the measured signal into the feature space.The performance of the fault diagnosis algorithm is also dependent on the features (Saxena, Wu, & Gopinath R., et al.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Vachtsevanos, 2005;Wu et al., 2004;Wu, Saxena, Patrick, & Vachtsevanos, 2005).Signal processing based feature extraction methods such as time-domain (Samanta & Al-Balushi, 2003), frequency-domain (Chen, Du, & Qu, 1995), wavelet (Peter, Peng, & Yam, 2001;Lin & Zuo, 2004;Yan, Gao, & Wang, 2009), and empirical mode decomposition (Yan & Gao, 2008;He, Liu, & Kong, 2011) have been widely used in machine condition monitoring applications.Many feature selection algorithms have been developed for effective fault diagnosis (Chiang, Kotanchek, & Kordon, 2004;Casimir, Boutleux, Clerc, & Yahoui, 2006;Verron, Tiplica, & Kobi, 2008;Y. Yang, Liao, Meng, & Lee, 2011;K. Zhang, Li, Scarf, & Ball, 2011) which are used to select the fault discriminative features from the feature space for better classification.Feature transformation approaches are also used to improve the fault identification performance (Widodo, Yang, & Han, 2007;Widodo & Yang, 2007;Y. Zhang, 2009).
Choosing an appropriate classification algorithm for a particular application is a difficult task.It also depends on the characteristics of extracted features from the raw data.SVM is an important supervised machine learning algorithm widely used in various applications including machine fault diagnosis (Nayak, Naik, & Behera, 2015).The performance of the SVM classifier could be affected by the kernel functions, training sample size, and kernel parameters.Zhou et al. investigated the effects of the training sample size, SVM order, and kernel parameters using least squares SVM for the linear, polynomial and Gaussian kernels (J.Zhou, Shi, & Li, 2011).Wang et al. reviewed SVM for uncertain data.Robust optimization is used when the direct model could not guarantee a good performance on uncertainty data set (X. Wang & Pardalos, 2015).Khang et al. proposed genetic algorithm based kernel discriminative features to improve the performance of multi-class SVM for low speed bearing fault diagnosis (Kang et al., 2015).Fu et al. made a comparative study on grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO) to optimize the penalty factor λ and γ in a kernel function in a training process (Fu, Tian, & Wu, 2015).
In this work, first the baseline system is developed using SVM kernels (linear, radial basis function (RBF), polynomial, and sigmoid) to identify inter-turn faults in a 3kVA synchronous generator.It is observed that the performance of the classifier is not satisfactory as the statistical features are non-linear and does not match with the kernels used.Performance of the SVM classifier can be improved by the following approaches: a) Select the features matching a particular kernel.b) Choose an appropriate kernel that fits the features.c) Linearize the features to a higher dimensional space and match with linear kernel.In this paper, the third approach is experimented to improve the performance of fault diagnosis of a synchronous generator using feature mapping techniques.
Sparse coding is an unsupervised machine learning algorithm used to represent feature vectors as a linear combination of basis vectors.Liu et al. used adaptive sparse features and classified the faults using multi-class linear discriminant analysis for machine fault diagnosis (Liu, Liu, & Huang, 2011).Liu et al. also pointed out that sparse coding requires high computational cost for dictionary learning for which faster algorithms are to be developed (Liu et al., 2011).Zhu et al. proposed an automatic and adaptive feature extraction technique via K-SVD.Faults are diagnosed using the reconstruction error of the sparse representation (Zhu et al., 2014).Further, fusion sparse coding technique was proposed to extract impulse components from the vibration signals effectively (Deng, Jing, & Zhou, 2014).For a good classification performance, coding algorithm should generate similar codes for similar feature vectors.However, sparse coding might select different bases for similar feature vectors to support sparsity, thus fails to capture the correlations between codes (J.Wang et al., 2010).
Local coordinate coding (LCC) overcomes the drawbacks of sparse coding by explicitly encouraging the bases to be local and consequently requires the codes to be sparse (Yu, Zhang, & Gong, 2009).However, sparse coding and LCC algorithms have the computational complexity of O(M 2 ), where M represents the total number of vectors in the basis set.Wang et al. proposed locality constrained linear coding (LLC) method for the image classification applications (J.Wang et al., 2010) to represent the non-linear features that improve the performance using linear classifiers (J.Yang, Yu, Gong, & Huang, 2009;X. Zhou, Cui, Li, Liang, & Huang, 2009).LLC is a fast implementation of LCC which uses locality constraint to select k nearest bases for each feature vector, therefore it reduces the computational complexity from O(M 2 ) to O(M + k 2 ).Recent studies show that LLC have been applied to many image processing applications such as video summarization (Lu et al., 2014), human action recognition (B.Wang et al., 2014;Rahmani, Mahmood, Huynh, & Mian, 2014), magnetic resonance (MR) imaging (P.Zhang et al., 2013), and colorization for gray scale facial images for improved performance (Liang et al., 2014).In this paper, the performance of the sparse coding and locality constrained linear coding (LLC) are compared for improving the performance of fault diagnosis of a synchronous generator.The details of the experimental setup, data collection, feature extraction, sparse coding, locality constrained linear coding (LLC) and support vector machine (SVM) are discussed in section 2. Experiments and results are discussed in section 3 and finally, section 4 concludes this paper.

Experimental Setup and Data Collection
In synchronous generators, short circuit faults may happen in stator winding and field winding coils.Generally, stator and field winding terminal of synchronous generator has taps at 0% and 100% of the coil windings.The three phase 3kVA synchronous generator is customized to inject faults in different magnitude.For example, 30%, 60%, and 82 % of the total number of turns in the stator winding leads are made available to the front panel for injecting short circuit faults between any of these points.2 and 3. Design details of the customized synchronous generator can be found in (Gopinath et al., 2013).
In this work, inter turn short circuit faults are injected in a controlled manner.Generator is connected to the three phase resistive load.Data acquisition system NI-PXI 62211 is used to interface the current sensors.Each experiment is conducted for 10 seconds and current signals are sampled at 1 kHz.This makes the number of samples available for each trial to be 10,000.The process is then repeated for different fault conditions to make the data collection process complete.Fig. 4, illustrates the current signatures acquired from the 3kVA generator.Specifications of the synchronous generator is listed in Appendix A. Short circuit faults are injected at 30% (8 turns), 60% (16 turns), and 82% (22 turns) of the total number of turns (27 turns).The data from each trial is divided into multiple frames with a window size of 512 samples.The time domain signal is converted into frequency domain using Fast Fourier Transform (FFT).Statistical frequency domain features are used to extract the fault information from the raw data.The details of the frequency domain features (Lei, He, & Zi, 2008) are listed in Table 1.

Sparse Coding
Sparse coding is an unsupervised learning method, learns the set of over-complete bases to represent the data efficiently (Olshausen & Field, 1997).Sparse coding finds the basis vectors such that, the input vectors X = [x 1 , x 2 , ......, x N ] T ∈ R D×N , with D dimension can be expressed as a linear combination of these bases.The input vector can be expressed as (Olshausen & Field, 1997): where Over-complete basis identifies the patterns in the input data.However, coefficients or codes c i are not uniquely determined by the input vectors, with an over-complete basis.This necessitates to add the sparsity criterion for better representation (J.Yang et al., 2009), i.e., most of the coefficients c i are zero or nearly zero and only a few coefficients are nonzero to represent the input data efficiently.Sparse coding can be expressed as (J.Yang et al., 2009): where λ c i l 1 is the sparse regularization term and it is determined as l 1 norm of c i .Sparse regularization ensures that the codebook is over-complete and unique solution for the underdetermined system, hence it captures the patterns in the input data.

Locality Constrained Linear Coding (LLC)
Locality constrained linear coding (LLC) is a feature mapping technique used to represent non-linear features as linear features (J.Wang et al., 2010).LLC codes are sparse and high dimensional.In LLC, each input vector is represented as a linear combination of k-nearest basis vectors.The basis vectors are computed from the data set using k-means clustering algorithm.Basis vectors are called as codebooks in the context of coding schemes algorithm (J.Wang et al., 2010).
The LLC coding process is described in where represents element-wise multiplication.Locality adaptor d i ∈ R M provides freedom for each basis vector proportional to its similarity to the input feature x i .Locality adaptor can be expressed as (J.Wang et al., 2010): where dist(x i , B) is the Euclidean distance between x i and b j and it can be written as (J.Wang et al., 2010): Stability factor Frequency centre Skewness frequency positional factor where s(k) is the spectrum for k = 1, 2..K, K is the number of spectrum lines f k is the frequency value of the k th spectrum.
). σ adjusts the weight decay speed for the locality adaptor.The constraints 1 T c i = 1 indicates the shift invariant requirements of the LLC code.LLC selects the local bases from the basis set for each feature vector to form a local coordinate system using Eq. ( 3).LLC encoding process can be speeded up by using the knearest neighbors of x i as the local bases B i instead of using all the bases in the Eq. ( 3).This approach is called as fast approximation LLC method and it uses the following criteria (J.Wang et al., 2010): Fast approximation LLC method reduces the computation complexity from O(M 2 ) to O(M +k 2 ).In this paper, fast approximation LLC is used for its reduced computational complexity and fast encoding process (J.Wang et al., 2010).

Support Vector Machine (SVM)
SVM (Vapnik & Vapnik, 1998) is a supervised learning algorithm which is widely used for classification problems.SVM .Margin of the classifier can be expressed as (Vapnik & Vapnik, 1998): In order to maximize the margin, SVM learning is formulated by rewriting the Eq. ( 6) as a minimization problem, and it can be expressed as (Vapnik & Vapnik, 1998): Subject to the constraint: where is the data set and y i ∈ {1, −1} be the class label of x i .b is the bias.Using the Lagrange multipliers, the optimal solution can be computed by using the following equation: Figure 6.Linear separating hyperplanes for a separable case where α i is a Lagrange multiplier.Using Karush Kuhn Tucker (KKT) conditions (Kuhn & Tucker, 1951), solution can be expressed as the following: Using the definition for w as defined in Eq. ( 9), the problem can be written in the dual form as: where k(x i , x j ) is a kernel function.α i can be obtained by solving the Eq. ( 11), and the decision function can be expressed as (Vapnik & Vapnik, 1998): When the data is not linearly separable, the non-linear kernel is applied for the classification, which transforms the features into a higher dimensional space, where it is linearly separable.In this paper, linear and non-linear SVM are experimented for the fault diagnosis.The complete process of the proposed approach is shown in the Fig. 7.

EXPERIMENTS AND RESULTS
In this work, the following experiments have been carried out for diagnosing the inter-turn faults in the 3 kVA synchronous generator.
1. Develop baseline system using linear, polynomial, radial basis function (RBF), and sigmoid SVM kernels for the fault classification 2. Improve the performance of the baseline system using feature mapping techniques, sparse coding and LLC.
3. Performance comparison of the feature mapping techniques using overall classification accuracy and receiver operating characteristic (ROC) curve.
Inter-turn faults are injected in the R, Y, and B phases of a 3 kVA generator stator winding.For every trial, current signals are acquired from all the three phases.Frequency domain statistical features are extracted from the raw data.Then, the experiments with the R, Y, and B phase inter-turn faults are treated as independent two class classification problems, i.e., no-fault or fault in R phase, no-fault or fault in Y phase and no-fault or fault in B phase, of a 3 kVA generator.It may be noted that an N class problem may be realized as N two class problems.
Experiments are conducted at different load conditions such as 0.5 A, 1 A, 1.5 A, 2 A, 2.5 A, 3 A, and 3.5 A loads.Data from these loads are combined together for fault classification.However, training and test data are collected separately, and no data is shared between the training and test sets.In this work, the experiments have been performed using balanced and unbalanced data sets to check the effectiveness of the proposed approach.Further, k-fold cross validation technique is also experimented using balanced data set separately.
In addition, experiments using unseen load condition is performed to check the effectiveness of the proposed approach in removing load dependencies of the features.Experiments are carried out on a IBM X3100 M42 (Intel Xeon E3-1220v2 series) server, with 8 GB memory and 3.1 GHz Quad-core processor.MATLAB3 toolboxes are used for computing sparse4 and LLC5 codes.

Experiments using balanced data set
In this experiment, the data set used for training and test are nearly balanced (no-fault data: 55% and fault data: 45%).Details of the data sets used in our experiments are presented in Table 2. Baseline system: The baseline system is developed using SVM kernels such as linear, polynomial, RBF, and sigmoid.Table 3 lists the baseline system performance of the SVM kernels for the R, Y, and B phase faults.From the experiments, it is noted that RBF kernel performs better when compared to other kernels for the baseline system.It is observed that the performance of the classifier is not encouraging since the features does not match with the kernels and exhibit non-linear characteristics.The classification accuracies are generally low in the baseline system because the fault characteristics vary largely across the load conditions.The classification accuracy can be improved by removing this load dependencies of the features.In this paper, the objective is to improve the performance of fault diagnosis of generator by linearizing the features in a higher dimensional space using feature mapping techniques.Table 7 lists the improved classification performance and its computation time for LLC.From our experiments, it is noted that LLC takes less computation time compared to sparse coding, therefore LLC reduces computational complexity and achieves improved performance for the classifier.

Experiments using unbalanced data set
In practical applications, injecting faults and collecting large number of fault data is not possible, to capture the intelligence about the system.This necessitates to analyze the fault identification system using smaller proportion of fault data, to check the effectiveness of the feature mapping techniques.In this work, the experiments were carried out using unbalanced data (No-fault data:80%; Fault-data:20%) by taking smaller proportion of fault data for the training.However, equal pro-portion of fault and no-fault data is taken for the testing for a fair comparison of the results with the balanced data set experiments.Details of the data set used in our experiments is listed in Table 8.The performances of the baseline, sparse coding and LLC for the unbalanced data set are listed in the Tables 9 -11.From the results it is observed that, the performance of the baseline system, sparse coding and LLC have reduced for R, Y and B phases with the use of unbalanced data set.For the sparse coding, 256 codebooks performs better than other codebook sizes.However, when compared to the performance of the sparse coding for the balanced and unbalanced data set for 256 codebooks, the performance for the unbalanced data set got decreased by 4.27%, 2.10%, and 2.25% absolute for the R, Y, and B phases respectively.Similarly, the performance of the LLC with 1024 codebooks using balanced and unbalanced data set, the performance for the unbalanced data set got reduced by 3.84%, 1.77%, and 0.76% absolute for the R, Y, and B phases respectively.Though the unbalanced data set affect the performance of the classifier, the overall classification accuracy does not reduce significantly for the sparse coding and LLC, emphasizing these algorithms are suitable when data under fault conditions is scarce.where c is the cost and γ is the kernel parameter.

ROC curve analysis
Receiver operating characteristic (ROC) curve is used to visualize and evaluate the classifier performance (Japkowicz & Shah, 2011).It shows the trade-off between the probability of detection or true positives rate (TPR), and the probability of false alarm or false positives rate (FPR).In this work, the performance of the baseline RBF system, sparse coding and LLC are compared using ROC curve for the balanced and unbalanced data set (smaller proportion of fault data is considered for the training).Figure 8 -10 shows the performance comparison of the feature mapping techniques for the R, Y, and B phase faults.It is observed that area under curve (AUC) value becomes closer to 1 for the LLC compared to baseline RBF and sparse coding techniques.However, for the experiments using unbalanced AUC value got reduced for the baseline RBF, sparse coding and LLC.

k-fold cross validation using balanced data set
The experiments discussed in the subsections 3.1 and 3.2 were performed using one partition of data only.However, the use of fixed data set may over-fit the model.To overcome this problem, k-fold cross validation technique is used to assess the performance of the classifier.The balanced data set is used for evaluating the performance of the classifier.In this experiment, 19152 samples of no-fault and 15323 of fault samples are used for 10-fold cross validation.The performance of the baseline system, sparse coding and LLC for the 10-fold cross validation technique is listed in Table 12-14.
From the experiments it is noted that RBF kernel performs better than other kernels.Sparse coding improves the performance by 7.61%, 1.07% and 5.37% for the R, Y, and B phases respectively for 256 codebooks.Similarly, LLC improves the  for the balanced data set does not affect the performance of the feature mapping techniques significantly compared to experiments using single partition of data.

Experiments on unseen load condition
Experiments are also carried out for unseen load condition of 3 kVA synchronous generator to check the effectiveness of LLC in removing the load dependencies of the features.The model is trained using 0.5A, 1A, 2A, 2.5A, 3A, and 3.5A load conditions and tested on 1.5A load.Total samples of 18240 and 1338 are used for training and testing, respectively, with an equal proportion of no-fault and fault data.From the experiments, it is observed that the best performance for the baseline system is obtained using linear kernel with an overall classification accuracy of 59.04%, 72.79% and 80.94% for the R, Y, and B phases respectively.The improved performance is obtained using LLC with an overall accuracy of 73.46%, 78.70%, and 87.66% by selecting 8, 23, and 36 nearest neighbors for the R, Y, and B phases respectively, with the codebook size of 64.Experimental results show that LLC could perform better even if unseen load condition is used for fault diagnosis.

CONCLUSIONS
In this paper, feature mapping algorithms, sparse coding and LLC are used to improve the performance of SVM for the inter-turn fault identification of 3kVA synchronous generator.As the features are non-linear, feature mapping techniques are used to linearize the features in a higher dimensional space to improve performance of the fault diagnosis system.Experiments are performed for the balanced and unbalanced data using single partition of data.Sparse coding improves the performance significantly with a high computational cost.Therefore, LLC is used to reduce the computational complexity and enhance the performance of the system.By using the balanced data set, 1024 codebooks with 40 nearest neighbors are selected empirically for its best performance.LLC improves the overall fault identification accuracy of the baseline RBF system by 22.56%, 18.43% and 17.05% absolute for the R, Y and B-phase faults respectively.The performance of the feature mapping techniques are also illustrated through ROC curves.The AUC value becomes closer to one for LLC compared to the sparse coding and baseline RBF system.The performance of the classifier is also assessed using 10-fold cross validation technique.
From the experiments, it is observed that LLC outperforms sparse coding in terms of classification performance and computational cost for the inter-turn fault diagnosis of the synchronous generator.Though LLC has been used widely in image classification problems, the reported experimental results show that, LLC could be used for other applications also if the features used in the system does not match with SVM kernels and exhibits non-linear characteristics.

Figure 1 .
Figure 1.Block diagram of the experimental setup

Figure 4 .
Figure 4. Current signature captured during no fault and inter turn fault conditions for the 3 kVA generator

Figure 7 .
Figure 7. Machine fault diagnosis using locality constrained linear coding (LLC)

Figure 10 .
Figure 10.ROC plot for the performance comparison of feature mapping techniques for the B phase fault

Table 1 .
Frequency Domain Features

Table 3
Table4lists the classification perfor-mance of the sparse coding for the inter-turn fault diagnosis of the generator.Experiments are performed for the different codebook sizes of 256, 512, and 1024, and obtained the improved performance for 256 codebooks.Sparse coding improves the performance of the baseline RBF kernel, from 76.66% to 87.35%, 81.35% to 87.37% and 82.31% to 91.14% for the R, Y and B phase faults respectively.Table5lists the improved classification performance and its computation time for sparse coding.Since the computation time is very large, it is not suitable for practical applications.
c = 1, γ = 0.0256where c is the cost and γ is the kernel parameter.book.In this process, codebook is common for the training and test data sets.Then the feature dimension is expanded from 39 (13 features per phase) to 256, 512, and 1024 feature vectors.Linear SVM is then used to classify the sparse represented features.

Table 4 .
Classification performance of sparse coding using linear SVM for the balanced data set

Table 5
Table 6 lists the performance of the LLC based linear SVM for the different codebook size and k-NN respectively.Codebook sizes 256, 512, and 1024 are used, and each codebook is experimented with 10, 20, 30, and 40 nearest neighbors for the R, Y, and B phase inter-turn faults respectively.Though codebook sizes 256 and 512 improves the performance, 1024 codebooks achieve the best classification performance.For 1024 codebooks with a selection of 40 nearest neighbors, LLC improves the baseline system (RBF kernel) by 22.56%, 18.43% and 17.05% absolute for the R, Y and B phase faults respectively.

Table 6 .
Classification performance of LLC using linear SVM for the balanced data set

Table 7
. Improved classification performance and its computation time for LLC (1024 codebooks) Inter-turn fault kNN Accuracy (%) CPU time (Sec)

Table 10 .
Classification performance of sparse coding using linear SVM for the unbalanced data set

Table 13 .
Classification performance of sparse coding using linear SVM for the 10-fold cross validation