Unsupervised Anomaly Detection for Hard Drives



Published Jun 29, 2021
Enrico Barelli Ennio Ottaviani


In the age of smart
sensors and industry 4.0 continuous monitoring
of different machinery produce enormous
amount of data, because of that datacenters are now-a-days a very important asset
not only for large scale cloud
providers, but also for medium to large enterprises,
which decide to store in-house the
ever increasing data collected during business operations.

An efficient method for the maintenance of the great number of hard-drives
housed in datacenters is critical to assure avaiability in a cost effective manner.
Since 2013, Backblaze \url{https://www.backblaze.com/} has published statistics
and datasets for researchers to gain insights on hard drive performaces and
their failures, in this paper more than 2.5 million records,
following hard-drives S.M.A.R.T readings for over a year, will be analyzed.

The objective of this paper is to show that it is possible to build a completely
unsupervised pipeline which produces an anomaly score that highly
correlates to hard drives time to failure (TTF), in such a way a decision
to replace them can be made before failure, with minimal waste due to
false alarms. Favorable comparisons with state of the art supervised
classifiers will be presented.

A brief example of how such a pipeline can be
extended for data streams and continuos sensor monitoring will be given.

How to Cite

Barelli, E., & Ottaviani, E. (2021). Unsupervised Anomaly Detection for Hard Drives. PHM Society European Conference, 6(1), 7. https://doi.org/10.36001/phme.2021.v6i1.2795
Abstract 311 | PDF Downloads 400



predictive manteinance, data mining, unsupervised learning

Technical Papers