DAGGER: Data AuGmentation GEneRative Framework for Time-Series Data in Data-Driven Smart Manufacturing Systems



Published Oct 26, 2023
Nicholas Hemleben Daniel Ospina-Acero David Blank Andrew VanFossen Frank Zahiri Mrinal Kumar


As industries transition into the Industry 4.0 paradigm, the relevance and interest in concepts like
Digital Twin (DT) are at an all-time high. DTs offer direct avenues for industries to make more
accurate predictions, rational decisions, and informed plans, ultimately reducing costs, increasing
performance and productivity. Adequate operation of DTs in the context of smart manufacturing relies
on an evolving data-set relating to the real-life object or process, and a means of dynamically updating
the computational model to better conform to the data. This reliance on data is made more explicit when
physics-based computational models are not available or difficult to obtain in practice, as it's the
case in most modern manufacturing scenarios. For data-based model surrogates to "adequately" represent
the underlying physics, the number of training data points must keep pace with the number of degrees of
freedom in the model, which can be on the order of thousands. However, in niche industrial scenarios
like the one in manufacturing applications, the availability of data is limited (on the order of a few
hundred data points, at best), mainly because a manual measuring process typically must take place for
a few of the relevant quantities, e.g., level of wear of a tool. In other words, notwithstanding the
popular notion of big-data, there is still a stark shortage of ground-truth data when examining, for
instance, a complex system's path to failure. In this work we present a framework to alleviate this
problem via modern machine learning tools, where we show a robust, efficient and reliable pathway to
augment the available data to train the data-based computational models. Small sample size data is a key limitation in performance in machine learning, in particular with
very high dimensional data. Current efforts for synthetic data generation typically involve either
Generative Adversarial Networks (GANs) or Variational AutoEncoders (VAEs). These, however, are are
tightly related to image processing and synthesis, and are generally not suited for sensor data
generation, which is the type of data that manufacturing applications produce. Additionally, GAN
models are susceptible to mode collapse, training instability, and high computational costs when used
for high dimensional data creation. Alternatively, the encoding of VAEs greatly reduces dimensional
complexity of data and can effectively regularize the latent space, but often produces poor
representational synthetic samples. Our proposed method thus incorporates the learned latent space
from an AutoEncoder (AE) architecture into the training of the generation network in a GAN. The
advantages of such scheme are twofold: \textbf{(\textit{i})} the latent space representation created
by the AE reduces the complexity of the distribution the generator must learn, allowing for quicker
discriminator convergence, and \textbf{(\textit{ii})} the structure in the sensor data is better
captured in the transition from the original space to the latent space. Through time statistics (up to
the fifth moment), ARIMA coefficients and Fourier series coefficients, we compare the synthetic data
from our proposed AE+GAN model with the original sensor data. We also show that the performance of
our proposed method is at least comparable with that of the Riemannian Hamiltonian VAE, which is a
recently published data augmentation framework specifically designed to handle very small high
dimensional data sets.

How to Cite

Hemleben, N., Ospina-Acero, D., Blank, D., VanFossen, A., Zahiri, F., & Kumar, M. (2023). DAGGER: Data AuGmentation GEneRative Framework for Time-Series Data in Data-Driven Smart Manufacturing Systems. Annual Conference of the PHM Society, 15(1). https://doi.org/10.36001/phmconf.2023.v15i1.3483
Abstract 324 | PDF Downloads 237



Synthetic data, Generative modeling, Generative adversarial networks, Variational autoencoders, Digital Twins

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In International conference on machine learning (pp. 214–223).

Astrom, K., & Murray, R. (2010). ¨ Feedback systems: An introduction for scientists and engineers. Princeton University Press.

Bank, D., Koenigstein, N., & Giryes, R. (2020). Autoencoders. arXiv preprint arXiv:2003.05991.Chadebec, C., & Allassonniere, S. (2021). Data augmentation with variational autoencoders and manifold sampling.

In Deep generative models, and data augmentation, labelling, and imperfections (pp. 184–192). Springer.
Chadebec, C., Mantoux, C., & Allassonniere, S. (2020). Geometry-aware hamiltonian variational auto-encoder. arXiv preprint arXiv:2010.11518, 0-44.

Chadebec, C., Thibeau-Sutre, E., Burgos, N., & Allassonniere, S. (2022). Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder. IEEE Transactions on Pattern
Analysis and Machine Intelligence.

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1), 53–65.

Demir, S., Mincev, K., Kok, K., & Paterakis, N. G. (2021). Data augmentation for time series regression: Applying transformations, autoencoders and adversarial networks to electricity price forecasting. Applied Energy,
304, 117695.

Diez-Olivan, A., Del Ser, J., Galar, D., & Sierra, B. (2019). Data fusion and machine learning for industrial prognosis: Trends and perspectives towards industry 4.0. Information Fusion, 50, 92–111.

Doersch, C. (2021). Tutorial on variational autoencoders. Figueira, A., & Vaz, B. (2022). Survey on synthetic data generation, evaluation methods and gans. Mathematics, 10(15). doi: 10.3390/math10152733 Goodfellow, I. (2016). Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Bengio, Y. (2014). Generative adversarial nets. In

Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27). Curran Associates, Inc.

Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2021). A review on generative adversarial networks: Algorithms, theory, and applications. IEEE transactions on knowledge and data engineering.

Hutter, F., Lucke, J., & Schmidt-Thieme, L. (2015). Beyond ¨manual tuning of hyperparameters. KI-Kunstliche Intelligenz, 29(4), 329–337.

Iglesias, G., Talavera, E., Gonzalez-Prieto, ´ A., Mozo, A., & ´ Gomez-Canaval, S. (2023). Data augmentation techniques in time series domain: a survey and taxonomy.Neural Computing and Applications, 35(14), 10123–10145.

Khanuja, H. K., & Agarkar, A. A. (2023). Towards gan challenges and its optimal solutions. Generative Adversarial Networks and Deep Learning: Theory and Applications.

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv.

Rezende, D.J., Mohamed, S., Wierstra, & D. (2014). Stochastic backpropagation and approximate inference in deep
generative models. International conference on machine learning, 1278–1286.

Shao, H., Yao, S., Sun, D., Zhang, A., Liu, S., Liu, D., Abdelzaher, T. (2020). Controlvae: Controllable variational autoencoder. In International conference on machine learning (pp. 8655–8664).

Shmelkov, K., Schmid, C., & Alahari, K. (2018). How good is my gan? In Proceedings of the european conference on computer vision (eccv) (pp. 213–229).

Smith, K. E., & Smith, A. O. (2020). Conditional gan for timeseries generation.

Teubert, C. (2022). Milling wear data set. Retrieved from https://data.nasa.gov/Raw-Data/ Milling-Wear/vjv9-9f3x (Dataset)

Wright, L., & Davidson, S. (2020). How to tell the difference between a model and a digital twin. Advanced Modeling and Simulation in Engineering Sciences, 7(1), 1–13.

Yang, Z., Li, Y., & Zhou, G. (2023). Ts-gan: Time-series gan for sensor-based health data augmentation. ACM Transactions on Computing for Healthcare, 4(2), 1– 21.
Technical Research Papers