This paper discusses a mixed method that combines unsupervised learning methods and human expert input for analyzing telemetry data from long-duration robotic space missions. Our goal is to develop more automated methods for detecting anomalies in time series data. Once anomalies are identified using unsupervised learning methods we use feature selection methods followed by expert input to derive the knowledge required for building on-line detectors. These detectors can be used in later phases of the current mission and in future missions for improving operations and overall safety of the mission. Whereas the primary focus in this paper is on developing data-driven anomaly detection methods, we also present a computational platform for data mining and analytics that can operate on historical data offline, as well as incoming telemetry data on-line.
anomaly detection, unsupervised learning, data driven methods, time series data
Aldrich, E. (2010). Wavelets: a package of functions for computing wavelet filters, wavelet transforms and multiresolution analyses. R package version 0.2-60. Apache Software Foundation. (n.d.-a). Apache Spark. http://spark.apache.org/.Apache Software Foundation. (n.d.-b). Apache Storm. http://storm.apache.org/.
Bay, S. D., & Schwabacher, M. (2003). Mining distancebased outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the ninth acm sigkdd international conference on knowledge discovery and data mining (pp. 29–38).
Benson, K., Fracchia, C., Wang, G., Zhu, Q., Almomen, S., Cohn, J., . . . others (2015). Scale: Safe community awareness and alerting leveraging the internet of things. IEEE Communications Magazine, 53(12), 27–34.
Bishop, C. (2001). Bishop pattern recognition and machine learning. Springer, New York.
Biswas, G., Simon, G., Mahadevan, N., Narasimhan, S., Ramirez, J., & Karsai, G. (2003). A robust method for hybrid diagnosis of complex systems. In Proceedings of the 5th Symposium on Fault Detection, Supervision and Safety for Technical Processes, 2003, June, 1125–1131.
Blanke, M., & Schr¨oder, J. (2006). Diagnosis and faulttolerant control (Vol. 691). Springer.
Budalakoti, S., Srivastava, A. N., Akella, R., & Turkov, E. (2006). Anomaly detection in large sets of highdimensional symbol sequences.
Budalakoti, S., Srivastava, A. N., & Otey, M. E. (2009). Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 39(1), 101–113.
Budka, K. C., Deshpande, J. G., Doumi, T. L., Madden, M., & Mew, T. (2010). Communication network architecture and design principles for smart grids. Bell Labs Technical Journal, 15(2), 205–227.
Burrus, C. S., Gopinath, R. A., & Guo, H. (1997). Introduction to wavelets and wavelet transforms: a primer. Prentice-Hall, Inc.
Caliski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3, no. 1, 1–27.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 15.
Chen, J., & Patton, R. J. (2012). Robust model-based fault diagnosis for dynamic systems (Vol. 3). Springer Science & Business Media.
Chidester, T. R. (2003). Understanding normal and atypical operations through analysis of flight data. In Proceedings of the 12th international symposium on aviation psychology, dayton, oh (pp. 239–242).
Chu, E., Gorinevsky, D., & Boyd, S. (2010). Detecting aircraft performance anomalies from cruise flight data. In Aiaa infotech aerospace conference, atlanta, ga.
Das, S., Matthews, B. L., & Lawrence, R. (2011). Fleet level anomaly detection of aviation safety data. In Prognostics and health management (phm), 2011 ieee conference on (pp. 1–10).
Das, S., Matthews, B. L., Srivastava, A. N., & Oza, N. C. (2010). Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study. In Proceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining (pp. 47–56).
Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification, 1(1), 7–24.
Deb, S., Pattipati, K. R., Raghavan, V., Shakeri, M., & Shrestha, R. (1995). Multi-signal flow graphs: a novel approach for system testability analysis and fault diagnosis. IEEE Aerospace and Electronic Systems Magazine, 10(5), 14–25.
De Kleer, J., & Williams, B. C. (1987). Diagnosing multiple faults. Artificial intelligence, 32(1), 97–130.
Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern classification. John Wiley and Sons..
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, pp. 226–231).
Fukunaga, K., & Koontz, W. L. (1970). A criterion and an algorithm for grouping data. IEEE Transactions on Computers 100, no. 10, 917–923.
Gr¨omping, U. (2012). Estimators of relative importance in linear regression based on variance decomposition. The American Statistician..
Hartigan, J. A. (1975). Clustering algorithms. New York: Wiley.
Hine, B., Spremo, S., Turner, M., & Caffrey, R. (2010). The lunar atmosphere and dust environment explorer (ladee) mission. In Ieee aerospace conference.
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85–126.
Hubert, L. J., & Levin, J. R. (1976). A general statistical framework for assessing categorical clustering in free recall. Psychological bulletin 83, no. 6.
Isermann, R. (2005). Model-based fault-detection and diagnosis–status and applications. Annual Reviews in Control, 29(1), 71–85.
Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic Journal of Statistics 1: 519-537..
Iverson, D. L. (2004). Inductive system health monitoring. In Proceedings of the international conference on machine learning; models, technologies & applications, mlmta ’04.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
Ji, M., Zhang, Z., Biswas, G., & Sarkar, N. (2003). Hybrid fault adaptive control of a wheeled mobile robot. Mechatronics, IEEE/ASME Transactions on, 8(2), 226–233.
Lee, E. A. (2008). Cyber physical systems: Design challenges. In Object oriented real-time distributed computing (isorc), 2008 11th ieee international symposium on (pp. 363–369).
Li, L., Gariel, M., Hansman, R. J., & Palacios, R. (2011). Anomaly detection in onboard-recorded flight data using cluster analysis. In Digital avionics systems conference (dasc), 2011 ieee/aiaa 30th (pp. 4–11).
Mack, D. L. (2013). Anomaly detection from complex temporal sequences in large data (Unpublished doctoral dissertation). Vanderbilt University.
Mack, D. L., Biswas, G., Koutsoukos, X. D., & Mylaraswamy, D. (2016, in press). Learning bayesian network structures to augment aircraft diagnostic reference models. IEEE Transactions on Automation Science and Engineering.
Marwedel, P. (2010). Embedded system design: Embedded systems foundations of cyber-physical systems. Springer Science & Business Media.
Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50.2, 159–179.
Niggemann, O., Biswas, G., Kinnebrew, J. S., Khorasgani, H., Volgmann, S., & Bunte, A. (2015). Data-driven monitoring of cyber-physical systems leveraging on big data and the internet-of-things for diagnosis and control. 26th International Workshop on Principles of Diagnosis, Paris, France.
Noura, H., Theilliol, D., Ponsart, J.-C., & Chamseddine, A. (2009). Fault-tolerant control systems: Design and practical applications. Springer Science & Business Media.
Pradhan, S. M., Dubey, A., Gokhale, A., & Lehofer, M. (2015). Chariot: a domain specific language for extensible cyber-physical systems. In Proceedings of the workshop on domain-specific modeling (pp. 9–16).
Qin, S. J. (2012). Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control, 36(2), 220–234.
R¨atsch, G., Sch¨olkopf, B., Mika, S., & M¨uller, K.-R. (2000). Svm and boosting: One class. GMD-ForschungszentrumInformationstechnik.
Strang, G. (1993). Wavelet transforms versus fourier transforms. Bulletin of the American Mathematical Society, 28(2), 288–305.
Sugar, C. A., & James, G. M. (2011). Finding the number of clusters in a dataset. Journal of the American Statistical Association.
Vaquero, L. M., & Rodero-Merino, L. (2014). Finding your way in the fog: Towards a comprehensive definition of fog computing. SIGCOMM Comput. Commun. Rev., 44(5), 27–32.
Venkatasubramanian, V., Rengaswamy, R., Yin, K., & Kavuri, S. N. (2003). A review of process fault detection and diagnosis: Part i: Quantitative model-based methods. Computers & chemical engineering, 27(3), 293–311.
Willis, D., Dasgupta, A., & Banerjee, S. (2014). Paradrop: A multi-tenant platform to dynamically install third party services on wireless gateways. Proceedings of the 9th ACM workshop on Mobility in the evolving internet architecture, 43–48.
Yan, M. (2005). Methods of determining the number of clusters in a data set and a new clustering criterion (Unpublished doctoral dissertation). Virginia Polytechnic Institute and State University.
Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A review on basic data-driven approaches for industrial process monitoring. Industrial Electronics, IEEE Transactions on, 61(11), 6418–6428.