An Introductory Approach to Time-Series Data Preparation and Analysis

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Published Oct 26, 2023
Edward Baumann Charles Hsu Hayley Buba Taylor Cox

Abstract

Machine learning (ML)/Artificial Intelligence (AI) has widespread applications and has revolutionized many industries due to advanced and matured sensor technology, as well as large-scale data collection efforts. One of the key tasks for effective ML/AI operations is the extraction and identification of useful and usable data to identify complex interrelationships and solve problems efficiently. The usefulness of the data is the value and meaning of the data within the desired model, while the usability of the data refers to the ease of use of data in a model. Complex supervised and unsupervised ML models, which used to be the domain of cutting-edge scientists and academics, can now be invoked as a basic function calls in public domain packages within Python, R, MATLAB, and other languages. While these functions require effective data preprocessing to overcome the unpredicted impacts of data quality in the real world (e.g. missing data, environmental noise, synchronizing at different sampling rates, etc.), their ease of use means they are often called with little to no understanding of the underlying math or ways to efficiently work through the data set. The approachability provided by the packages enables users to dive into complex problem sets with little advance preparation. However, in doing so there is a lack of understanding which will inevitably cause problems, skew results, or force the user to take a less efficient path to get to a similar answer. Each package provides relatively simple examples that deal with specific public data sets, yet not many provide the background knowledge and comprehensive methods required for building the inputs for extensive and effective time-series data modeling. Typically, the complex nature of time-series data requires an in-depth understanding of signals analysis and domain subject expertise to use in ML/AI predictive models. This paper will provide the reader an overview of the problems associated with time-series data modelling, propose a common set of preprocessing steps to follow, demonstrate a taxonomy classification for time series data, provide introductory reasoning regarding the underlying process, and discuss the models that would benefit from such a methodology. This is done here with the goal of equipping non-knowledge-domain experts with updated and approachable techniques to find which features to focus on while preprocessing for their time-series data preparation efforts.

How to Cite

Baumann, E., Hsu, C., Buba, H., & Cox, T. (2023). An Introductory Approach to Time-Series Data Preparation and Analysis. Annual Conference of the PHM Society, 15(1). https://doi.org/10.36001/phmconf.2023.v15i1.3561
Abstract 279 | PDF Downloads 394

##plugins.themes.bootstrap3.article.details##

Keywords

Machine Learning (ML) / Artificial Intelligence (AI), Supervised and Unsupervised ML, Data preprocessing, time series data, knowledge domain, probability distribution, feature extraction and selection, data preparation

References
Ariew, R. (1976). Ockham’s razor: A historical and philosophical analysis of ockham’s principle of parsimony (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign.
Baumann, E., Forero, P. A., Selby, G., & Hsu, C. (2021). Methods to improve the prognostics of time-to-failure models. In Annual conference of the phm society.
Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. Holden-Day.
Esling, P., & Agon, C. (2012, dec). Time-series data mining. ACM Comput. Surv., 45(1). Retrieved from https://doi.org/10.1145/2379776.2379788 doi: 10.1145/2379776.2379788
Han, J. (2011). Data mining: Concepts and techniques, 3rd ed. Morgan Kaufmann.
Jones, P. R. (2019). A note on detecting statistical outliers in psychophysical data. Attention, Perception & Psychophysics,
5(81), 1189–1196.
Keijzer, D. A., Keulen, V. M., & Dekhtyar, A. (2007). Report on the first vldb workshop on management of uncertain data (mud). (Tech. Rep.).
Kruger, F. (2016). Activity, context, and plan recognition with computational causal behaviour models (Unpublished doctoral dissertation). Universitat Rostock.
Kumar, V., & Minz, S. (2014, Jun). Feature selection: A literature review. Smart Computing Review, 4(3), 211-
229. Lines, J., & Bagnall, A. (2015). Time series classification with ensembles of elastic distance measures. Data
Mining and Knowledge Discovery, 29(3), 565-592. Profillidis, V., & Botzoris, G. (2019). Modeling of transport demand: Analyzing, calculating, and forecasting transport demand. Elsevier.
Pukelsheim, F. (1994). The three sigma rule. The American Statistician, 48(2), 88–91. Retrieved 2023-06-04, from
http://www.jstor.org/stable/2684253
Radzuan, N. F. M., Othman, Z., & Bakar, A. A. (2013). Uncertain time series in weather prediction. In Procedia
technology.
Teng, C. M. (1999). Correcting noisy data. In 16th international conference on machine learning.
Section
Poster Presentations