A Novel Ensemble Clustering for Operational Transients Classification with Application to a Nuclear Power Plant Turbine

The objective of the present work is to develop a novel approach for combining in an ensemble multiple base clusterings of operational transients of industrial equipment, when the number of clusters in the final consensus clustering is unknown. A measure of pairwise similarity is used to quantify the co-association matrix that describes the similarity among the different base clusterings. Then, a Spectral Clustering technique of literature, embedding the unsupervised K-Means algorithm, is applied to the coassociation matrix for finding the optimum number of clusters of the final consensus clustering, based on Silhouette validity index calculation. The proposed approach is developed with reference to an artificial case study, properly designed to mimic the signal trend behavior of a Nuclear Power Plant (NPP) turbine during shut-down. The results of the artificial case have been compared with those achieved by a state-of-art approach, known as Clusterbased Similarity Partitioning and Serial Graph Partitioning and Fill-reducing Matrix Ordering Algorithms (CSPAMETIS). The comparison shows that the proposed approach is able to identify a final consensus clustering that classifies the transients with better accuracy and robustness compared to the CSPA-METIS approach. The approach is, then, validated on an industrial case concerning 149 shut-down transients of a NPP turbine.


Introduction
In industries such as nuclear, oil and gas, automotive and chemical, equipments are subjected to several causes of performance degradation and exposed to faulty conditions, e.g., presence of manufacturing defects, unexpected interactions with the environment, wear and tear (Bolotin & Shipkov, 1998;Muller, Suhner, & Iung, 2008;Baraldi, Di Maio, & Zio, 2012;Baraldi, Di Maio, & Zio, 2013c). Capturing the different operational conditions of these equipments, detecting the onset of abnormal conditions and classifying them in different types can aid the decision maker to decide a proper maintenance intervention policy and, hence, increase equipment reliability and system safety while reducing overall corrective maintenance costs (Jardine, Lin, & Banjevic, 2006; Al-Dahidi, Baraldi, Di Maio, & Zio, 2014).
Measurements of relevant signals are collected during operation. These transient data are representative of different operational conditions of the equipment. For fault diagnosis, these data are manipulated with the objective of partitioning them into dissimilar groups, whose number is "a priori" unknown, such that data belonging to the same group are more similar than those belonging to the other groups, and corresponding to different equipment conditions. In particular, one can distinguish, among the groups, anomalous behaviors of the equipment and relate them to specific root causes (Fred & Jain, 2005;Xiufeng & Changzheng, 2010;Wu & Lee, 2011;Serir, Ramasso, & Zerhouni, 2012;Baraldi, Di Maio, Zio, Rigamonti, & Seraoui, 2013a;Serir, Ramasso, Nectoux, & Zerhouni, 2013).
The problem of grouping the operational transients of an industrial equipment can be formulated as an unsupervised clustering problem aimed at partitioning the transient data into homogeneous clusters so that those data belonging to the same cluster are very similar to each other and dissimilar to those of the other clusters (Salvador, 2002;Bocaniala, Sa Da Costa, & Palade, 2004;Zhou, Zhang, & Wang, 2004;Chaovalit & Zhou, 2005;Wang, Yu, Siegel, & Lee, 2008;Wang, 2010;Baraldi et al. 2013a;Lin, Chen, & Zhou, 2013).
A typical ensemble clustering scheme is shown in Figure 1. For a given dataset , X the construction of the ensemble amounts to the aggregation of the results of multiple base clusterings. The base clusterings composing the ensemble can be different because of the different algorithms used and/or because of the different data and features upon which clustering is performed. The outcome of the multiple base clusterings are aggregated into a final consensus clustering P*, by a given method of aggregation (Strehl & Ghosh, 2002;Topchy et al. 2004;Chen, 2007;Greene & Cunningham, 2007;Vega-Pons & Ruiz-Shulcloper, 2011;Ahuja & Dhanya, 2012). The main challenges for an effective consensus strategy of aggregation are (Topchy et al. 2004): 1) different base clusterings group data differently and, maybe, in different numbers of clusters, 2) the correspondence between the clusters labels of different base clusterings is unknown, 3) the number of clusters M in the final consensus clustering is "a priori" unknown, 4) some base clusterings might not label some data (missing labels), and 5) for large datasets, large computational times might be needed.
Co-association based methods summarize similarities among base clusterings into a co-association matrix (Strehl & Ghosh, 2002), even for different numbers of clusters for the base clusterings, without any previous knowledge on M, but with high computational demands (Fred & Jain, 2005;Vega-Pons & Ruiz-Shulcloper, 2011).
In genetic algorithm-based methods, the search capability of genetic algorithms is used to identify the most stable clusters once the label correspondence problem is solved (Ghaemi et al. 2009). The plus of the method is its ability to identify clusters that are not easily found by other methods, even for different numbers of clusters for each base clustering; on the other hand, its computational burden, and its inability to deal with the missing labels constitute practical limitations (Topchy et al. 2004;Vega-Pons & Ruiz-Shulcloper, 2011).
In Finite Mixture Models, the final consensus clustering is seen as a probability model in the space of the base clusters and is found as a solution to the maximum likelihood problem for a given ensemble clustering (Topchy et al. 2004;Di Maio, Nicola, Zio, & Yu, 2014). The method does not solve the label correspondence problem, it is able to handle missing labels, it deals with different numbers of clusters for each base clustering and does not need any previous knowledge on M (Figueiredo & Jain, 2002), but its computational burden due to the estimation of the covariance matrices, makes the method difficult to apply in practice.
Graph and Hypergraph partitioning algorithms, such as the Cluster-based Similarity Partitioning (CSPA), construct a graph from the similarities among the base clusterings, and cluster it using a graphic-based clustering algorithm, such as Serial Graph Partitioning and Fill-reducing Matrix Ordering Algorithm (METIS) (Karypis & Kumar, 1995;Karypis & Kumar, 1998;Strehl & Ghosh, 2002;Topchy et al. 2004), for a predetermined value of M (Topchy et al. 2004;Ghaemi et al. 2009).
The method does not solve the correspondence between the base clusterings labels, can handle the missing labels and different numbers of clusters for each base clustering, but it suffers computation limitations for large datasets. Despite this, CSPA and METIS algorithms have been taken as reference for comparison in this paper because CSPA-METIS is the simplest and "often" best performing method for consensus aggregation among other Graph and Hypergraph partitioning algorithms, e.g., Meta-CLustering Algorithm (MCLA) and HyperGraph-Partitioning Algorithm (HGPA) (Strehl & Ghosh, 2002;Chen, 2007), whose pitfall is that the number of final consensus clusters cannot exceed the maximum number of the individual base clusters.
The novelty of the proposed approach is to replace METIS algorithm with Spectral Clustering (Von Luxburg, 2007;Baraldi et al. 2013b) and Silhouette validity index (Rousseeuw, 1987), to automatically determine M which by most industrial applications, is not known "a priori" (Chakaravathy & Ghosh, 1996;Strehl & Ghosh, 2002;Li & Chen, 2011). More specifically, the Spectral Clustering technique, embedding the unsupervised K-Means algorithm, is applied to the co-association matrix that describes the similarity among the different base clusterings obtained on a set of diverse sources of data (features) (e.g., vibration, temperature signals), rather than to the similarity values among the data themselves, for mining the clusters that are formed by the most similar data. Then, the optimum number of clusters C* is selected among several candidates C candidate , based on the morphology of the obtained final consensus clusters evaluated by the Silhouette validity index that measures the similarity of the data belonging to the same cluster and the dissimilarity of these in the other clusters (a large Silhouette value indicates that the obtained clusters of the final consensus clustering are well separated and compacted (Rousseeuw, 1987;Charrad, Lechevallier, Ahmed, & Saporta, 2010).
The proposed approach is developed on an artificial case study properly designed to mimic the signal trend behavior of Nuclear Power Plants (NPPs) turbines during shut-down transients.
Different sets of features have been simulated and used to obtain different base clusterings, representative of different groupings of the shut-down transients of the turbine. The correct number of clusters, for each base clustering, has been identified by the Davies-Bouldin (DB) criterion: the minimum DB value is reached for the number of clusters which gives optimal separation and compactness (Davies & Bouldin, 1979). Three controlled datasets containing M sparse or overlapping clusters of their base clusterings results have been considered. The results obtained have been compared with those achieved by CSPA-METIS. It has been found that the proposed approach is able to identify the final consensus clustering with better accuracy and robustness compared to the CSPA-METIS approach.
The approach is, then, applied to a real industrial case concerning 149 shut-down transients of a NPP turbine: different base clusterings representative of different groupings of the shut-down transients of the turbine are obtained by using multiple different sources of data (features), i.e., vibration, turbine shaft speed, vacuum, and temperature signals, and a final consensus clustering is obtained that gives the optimal grouping of the shut-down transients of the NPP turbine, in terms of groups separation and compactness.
The remainder of the paper is organized as follows. In Section 2, the basics of CSPA-METIS ensemble approach are recalled. In Section 3, the proposed ensemble clustering approach is presented. The artificial case study representative of the signal trend behavior of a Nuclear Power Plant (NPP) turbine during shut-down transients is introduced in Section 4. Furthermore, the results obtained with the application of the proposed approach to the artificial case and the comparison with CSPA-METIS, are discussed. Section 5 verifies the robustness of the proposed approach to clustering overlapping, in identifying the number M for three controlled datasets containing sparse or overlapping clusters of their base clusterings results. The real industrial case concerning 149 shut-down transients of a NPP turbine is introduced in Section 6 and the results of the application of the proposed approach to the case study are discussed. Finally, Section 7 concludes the paper with some considerations.

The CSPA-METIS ensemble clustering approach
In this Section, the combination of CSPA and METIS is described and considered as reference ensemble clustering approach, for the case when the number M of clusters in the final consensus clustering is known.
The flowchart for the method is sketched in Figure 2. The algorithm goes along the following two phases: a procedure (i.e., CSPA) for establishing a co-association matrix and a procedure (i.e.,

METIS)
for partitioning the graph obtained from the co-association matrix to obtain the final consensus clustering P* (Strehl & Ghosh, 2002;Topchy et al. 2004).
We consider N data belonging to the dataset X that are clustered into H base clusterings. For each jth base clustering, j=1,…,H, each datum is labeled by an integer number ranging in [1, ] j opt C , where j opt C is the number of clusters for each j-th base clustering. The problem of clustering the N data is, thus, transformed into an aggregation problem of the base clusterings outcomes Y of size NxH.
The algorithm entails three main steps; without loss of generality, these are hereafter described on a simple numerical example where X contains N=5 data, clustered into H=3 base clusterings: Step 1: Adjacency matrix computation. In practice, for each j-th base clustering (reported in Table 2 for the simple explanatory example), if two data belong to the same cluster they are considered similar, i.e., similarity μ=1, and if not they are dissimilar, i.e., similarity μ=0. Thus, for each j-th base clustering, an adjacency binary similarity matrix,

Adjacency matrix computation
Similarity matrix computation,  Step 2: Similarity matrix computation. The entry-wise average of the obtained H binary similarity matrices leads to obtaining the overall similarity matrix Figure 3, right), of size NxN (Strehl & Ghosh, 2002). In this way, each entry of the similarity matrix has a value in [0,1], which is proportional to how likely a pair of data is, when grouped together. Step 3: Final consensus clustering computation. To produce a final consensus clustering P*, the graphic-based clustering algorithm METIS is adopted to partition the obtained similarity graph (shown in Figure 3, right) (Strehl & Ghosh, 2002). METIS is a multilevel graph partitioning algorithm that entails three main steps (refer to Karypis & Kumar, 1998, for more details):

METIS
1. the original graph is collapsed (coarsed) in smaller graphs (where the vertices are the data and the edges are the similarities), by resorting to Random Matching (RM) (Bui & Jones, 1993), 2. Spectral Bisection is used for partitioning the coarsened graphs (Barnard & Simon, 1994), 3. The partitions effectiveness is quantified by successively projecting the partitions into the original graph. It has been shown that METIS produces a high quality partitioning in a relatively small amount of time. However, the number of partitions to be found and, hence, the number of clusters in the final consensus clustering, has to be known "a priori". One option can be to assign the number of clusters in the final consensus clustering to be equal to the maximum number of clusters in the H base clusterings, M=max ( j opt C ), j=1,…,H.
In the following Section, an ensemble approach is proposed to overcome the requirement of an "a priori" knowledge of the number of clusters M in the final consensus clustering.

The proposed ensemble clustering approach
In this Section, an ensemble approach is proposed, that evolves from that of Section 2 to avoid the hypothesis on the number of clusters M in the final consensus clustering. The proposed approach is based on the combination of: 1) CSPA method to compute the similarity matrix , S 2) Spectral Clustering to transform S into a normalized laplacian matrix , rs L and then, compute its spectrum information (eigenvectors) (see Appendix A.1), 3) a clustering algorithm, e.g., the K-means algorithm, that is fed with the eigenvectors calculated in the previous step 2), to find the final consensus clustering, and 4) the Silhouette index to quantify the goodness of the obtained clusters (see Appendix A.2).
The flowchart for the method is sketched in Figure 4. The method goes along the following steps: Step 1: Adjacency matrix computation. This Step corresponds to Step 1 of Section 2.
Step 2: Similarity matrix computation. This Step corresponds to Step 2 of Section 2.
Step 3: Spectral Clustering. Once the overall similarity matrix S is computed, Spectral Clustering Step 4: Clustering algorithm. For each candidate number of clusters C candidate , the reduced matrix of U with a size NxC candidate is partitioned into C candidate clusters by using a single clustering algorithm and the final consensus clustering * candidate C P is obtained. In this work, we resort to the K-means algorithm, one of the most used clustering methods, to partition U into K=C candidate clusters (Su & Chou, 2001;Fern & Lin, 2008). ..

Original dataset matrix (N data)
Labels matrix Step 5: Final consensus clustering selection. For each C candidate , the obtained consensus clustering * candidate C P is evaluated by computing its Silhouette validity index candidate C SV (Rousseeuw, 1987). The most appropriate consensus clustering * * C P is the one for which the Silhouette reaches a maximum, for which clusters are well separated and compacted (see also Appendix A.2).

Artificial case study
An artificial case study has been designed to generate N=149 data representative of the signals trends behaviors of M=7 different settings of shut-down operations. This is done to mimic the real industrial case of Section 6, concerning N=149 real shut-down transients of a NPP turbine. Each datum is described by F=7 features (as for the real case study of Section 6), representative of the turbine condition, e.g., mean value of the vibration signals, and of the environmental and operational conditions that can influence the turbine behavior, e.g., mean values of the vacuum and temperature signals. These data are stored in a matrix X of a size 149x7.
The objective is to reveal the "hidden" (but simulated and, thus, known) structure P* of the dataset X by identifying groups of data with similar functional behaviors, representative of different operational conditions of the turbine. Without loss of generality, it is assumed that the operational conditions of the NPP turbine are M=7: 1) three classes of normal condition (NC1, NC2, NC3), 2) three classes of abnormal condition (AC1, AC2, AC3), and 3) one class of outliers (i.e., unknown behaviours). The dataset X is pictorially shown in Figure 5: data with similar characteristics, e.g., vibration signals, and environmental and operational conditions which can influence the turbine behavior, e.g., vacuum and temperature signals, have been grouped together and will be treated within the same base clustering.
As shown in Figure 5, H=3 sets of features j X have been simulated and considered: the set of features 1, 2, and 3 (j=1), that of features 4 and 5 (j=2), and that of features 6 and 7 (j=3). This is found by a filter approach for which the optimal subsets of features are selected on the basis of statistical properties.  The values of the features for different classes of data have been created by randomly sampling their realization from different multivariate normal and log-normal distribution functions (1 to 10 in Figure 5), whose combination characterizes the class. The objective is to aggregate these base clusterings into a final consensus clustering P*, capable of identifying the "true" grouping of the shut-down transients of the NPP turbine.
To mine the clusters shown in Figure 6, the j-th base clustering outcomes are obtained by the unsupervised Fuzzy C-Means (FCM) algorithm (Baraldi et al. 2013c). indices can be used (Onanena, Oukhellou, come, Jemei, Candusso, Hissel, & Aknin, 2013). In this work, Davies-Bouldin (DB) validity criterion has been considered for mining the clusters of the base clusterings (Davies & Bouldin, 1979) (whereas, the Silhouette validity index is used for identifying the optimum number of clusters in the final consensus clustering). The Davies-Bouldin (DB) criterion is based on the ratio of within-cluster and between-cluster distances: the optimal clustering, which gives optimal separation and compactness of the obtained clusters, has the smallest DB index value (Davies & Bouldin, 1979;Legány, Juhász, & Babos, 2006;Onanena et al. 2013). , we use the information on the "simulated" classes to which the data belong, to calculate the misclassification rate (Table 3) (it is worth noticing that in real industrial applications the real class is unknown).  The obtained base clustering labels for each set of features have been, then, stored in a matrix Y of size 149x3. The application of the clustering ensemble approach aims at finding the final consensus clustering of the data. In Section 4.1 and Section 4.2 the CSPA-METIS approach and the proposed approach are applied, respectively.

Application of CSPA-METIS approach
The application of the CSPA-METIS approach is here described according to the steps illustrated in Section 2: the overall adjacency matrix A and the overall similarity matrix S have been computed (Steps 1 and 2), respectively. A graph is obtained from S and METIS is used to produce a final consensus clustering (Strehl & Ghosh, 2002;Topchy et al. 2004).
To this aim, the number of clusters M=7 in the final consensus clustering is assumed to be known "a priori". Figure 8 shows the obtained results of the aggregation P* (left) compared to the true clustering (right). the obtained clustering results with the true "simulated" clustering, one can calculate the misclassification rate to be equal to 41.6% (62 out of 149 data incorrectly classified), which is not a satisfactory result.  Table 3), whereas the upper bound (16) is the number of the largest combination of the three base clusters (i.e., 2x2x4). The optimum number of clusters C* in the final consensus clustering is found for the value at which the Silhouette measure is maximized, i.e., C* =3 (star in Figure 9) (for which the obtained clusters are well separated and compacted). Despite that, again the clusters are not representative of the true "simulated" clustering, i.e., M=7.
The obtained results of the aggregation P*, compared with the true clustering are shown in Figure   10 (left and right, respectively). Comparing the obtained clustering results with the true "simulated" clustering, one can calculate the misclassification rate to be equal to 36.9% (55 out of 149 data incorrectly classified). In the following Section, the application of the developed approach is shown to improve the final consensus clustering.

Application of the proposed ensemble clustering approach
The application of the proposed ensemble clustering is here described according to the steps presented in Section 3: the method entails a similar procedure of CSPA-METIS for calculating S and a procedure to identify the final consensus clustering P*.  (2) is the minimum number of base clusters (see Table 3), whereas the upper bound (16) is the number of the largest combination of the three base clusters (i.e., 2x2x4): the optimum number of clusters C* in the final consensus clustering is the value at which the Silhouette is maximized, i.e., C* =6 (star in Figure 11). The results of the application of the proposed method to the artificial case study are represented in Figure 12. Comparing the obtained clustering results (left) with the true "simulated" clustering (right), one can recognize that the misclassification rate has been reduced to 4.03% (6 out of 149 data incorrectly classified). It is worth noticing that only six out of seven operational conditions have been recognized (C* =6, while M=7). The outliers (three transients -class 7) have not been grouped together: this depends on the capability of the base clustering algorithm in recognizing the outliers (Topchy et al. 2004;Topchy et al. 2005;Serir et al. 2012).
For example, the optimum number of clusters for the j=1 set of features is 1 2 opt C  (see Figure 7), whereas it should be equal to 3 (see Figure 5). This sensitivity to the quality of the data at hand calls for an investigation on the robustness of the proposed method to different dataset characteristics, as it will be discussed in the following Section.

Robustness of the ensemble clustering approach to clustering overlapping
To verify the robustness of the proposed approach, a controlled sensitivity test has been designed.
By robustness, here we intend the property of the approach to provide final consensus clustering with low misclassification rate even in case of a large overlap or separation of the real clusters.
With this aim, the clusters of Figure 6 have been modified by changing the parameters of the multivariate distributions from which the data are sampled, as follows: 1. Case I (Large separation): in this case, the clusters of the j-th set of features, j=1,…,3 are designed to be well separated and compacted.
2. Case II: this is typically the case of Section 4. In this case, the clusters of the j-th set of features, j=1,…,3, are slightly overlapped compared to Case I.
3. Case III (Large overlap): in this case, the obtained clusters from the j-th set of features, j=1,…,3, are overlapped and less compact. Figure 13 shows the three cases for the three sets of features. As long as we are moving from Case I to Case III, the clusters identified start overlapping and become less compact.  The performances of the two approaches can be more precisely compared by calculating the misclassification rates in the three test cases by using the information on the real classes to which the data belong. The misclassification rates for the three cases using the two approaches are reported in Table 4. Furthermore, as the clusters of the sets of features are overlapped and spread (Case III), the performance of the proposed approach decreases compared to Case I, as expected. In conclusion, we can state that the proposed approach is superior to CSPA-METIS, for this particular dataset.

The real case study
The proposed approach has been applied to a real industrial case concerning N=149 real shut-down multidimensional transients of a NPP turbine. The generic i-th transient is a multidimensional transient in a Z=70 dimensional signal space with a time horizon of N p =4500 time steps (2.5 hours).
The objective is to partition the N=149 multidimensional transients into M ("a priori" unknown) dissimilar groups, such that transients belonging to the same group are more similar than those belonging to the other groups. Engineering and experts judgment suggest a set of H=2 base clusterings: 1. Clustering of data representative of the turbine condition (j=1): seven signals of the turbine shaft vibrations have been considered (taken from sensors located at different stages of the turbine, whose detailed characteristics cannot be provided, due to confidentiality reasons), since vibration data contains signatures which, if properly interpreted, can reveal the operational condition of the turbine (Betta, Liguori, Paolillo, & Pietrosanto, 2002;Baraldi et al. 2013a 2. Clustering of data representative of the environmental and operational conditions that can influence the turbine behavior (j=2): the values of turbine shaft speed, vacuum and structural temperature signals have been considered (Baraldi et al. 2013b) (taken from different locations of the turbine, whose details cannot be disseminated, due to confidentiality reasons). The optimum numbers of clusters is found to be 2 6 opt C  .
The base clusterings results have been aggregated in a matrix Y with a size of 149x2 and the proposed approach has been applied following the steps illustrated in Section 3. The optimum number of clusters C* in the final consensus clustering is selected according to the Silhouette values for different numbers of clusters C candidate that span in the interval [5,30], where the lower bound (5) is the minimum between 1 opt C and 2 opt C , and the upper bound (30) is the number of the largest combination of the two base clusters (i.e., 5x6).
It is important to point out that neither a too large nor a too small number of clusters can be considered as a valuable result from the practical point of view of linking turbine conditions with environmental and operational conditions: a large number of clusters makes the explanation of the turbine conditions too vague, whereas a small number is at risk of poor specification of the obtained clusters. In this analysis, the optimum number of clusters C* in the final consensus clustering is found to be C* =14, at which the Silhouette measure is maximized (star in Figure 15): this is a good compromise between small and large numbers of clusters. Figure 15 shows, indeed, that the Silhouette values for small and large numbers of C* are much worse than for C* =14, due to the dissimilarity of the data (inappropriately) assigned to the same clusters. Results of the application to the real case study are shown in Figure 16, where the N=149 transients are plotted in chronological order on the horizontal axis along with the j=1 base clustering results (the vertical axis) and the j=2 base clustering results represented by six different markers (square, diamond, star, triangle, circle, and dot).

C C C C
Since the transients are numbered in increasing order with respect to their "calendar" occurrence, it has been possible to infer from the experts that the functional behavior of the turbine is different in the four clusters because of major maintenance interventions that have been undertaken at the specific calendar times and have resulted in radical changes of the turbine behaviour.

Conclusions
In this work, an approach to build a consensus clustering of individual base clusterings is proposed, based on Spectral Clustering and Silhouette validity index. First, the base clustering results are summarized in a co-association matrix by pairwise similarity computation. Then, a Spectral Clustering technique, embedding the unsupervised K-Means algorithm, is applied to the matrix of similarity values so that the clusters are formed by the most similar data. The optimum number of clusters is selected among several candidates based on the morphology of the obtained clusters, measured by the Silhouette validity index that gives reason of the similarity of data belonging to the same cluster and the dissimilarity with those in the other clusters.
The proposed approach has been successfully applied to an artificial case study "properly" designed to reproduce the signal trend behavior of a

Appendices Appendix A.1 Unsupervised Spectral clustering
Spectral Clustering technique uses the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions (Baraldi et al. 2012; Baraldi et al. 2013c). In this work, the similarity matrix S of size NxN is computed by Clusterbased Similarity Partition Algorithm (CSPA). The Spectral Clustering technique entails four steps (Baraldi et al. 2013a): Step 1: Normalized Laplacian Matrix. Starting from the similarity matrix S , the degree matrix D is calculated, whose entries d 1 , d 2 ,…, d N are: 1 , 1, 2,..., Step 2: Eigenvalues and eigenvectors of first C eigenvalues are such that they are very small whereas λ C+1 is relatively large (Ng, Jordan, & Weiss, 2001;Von Luxburg, 2007;Zhao & Liu, 2007).
Step 3: Number of clusters. The number of clusters is set equal to C, according to the Eigengap heuristic theory (Mohar, 1997).
Step 4: Feature extraction. The relevant information on the structure of the matrix S is obtained by considering the eigenvectors 1 2 , ,..., N u u u associated to the C smallest eigenvalues of its laplacian matrix rs L . The square matrix S is transformed into a matrix U of size [N, C], in which the C columns of U are the eigenvectors (Von Luxburg, 2007).

Appendix A.2 Silhouette validity index
To evaluate the optimal number of clusters * C among several clusters candidates, Silhouette validity index has been adopted. The silhouette value for the i-th datum, i=1,…,N, is a measure of how similar/dissimilar that datum is to others in its own cluster and to the other clusters, respectively.
The silhouette value for the i-th datum S i is defined as (Rousseeuw, 1987): where a i is the average distance from the i-th datum to the others in the same cluster, and b i is the minimum average distance from the i-th datum to the others in a different cluster, minimized over clusters.
The mean of the silhouette values for the m-th cluster C m is called the cluster mean silhouette and is