Ensemble classifiers for drift detection and monitoring in dynamical environments
Detecting and monitoring changes during the learning process are important areas of research in many industrial applications. The challenging issue is how to diagnose and analyze these changes so that the accuracy of the learning model can be preserved. Recently, ensemble classifiers have achieved good results when dealing with concept drifts. This paper presents two ensembles learning algorithms BagEDIST and BoostEDIST, which respectively combine the Online Bagging and the Online Boosting with the drift detection method EDIST. EDIST is a new drift detection method which monitors the distance between two consecutive errors of classification. The idea behind this combination is to develop an ensemble learning algorithm which explicitly handles concept drifts by providing useful descriptions about location, speed and severity of drifts. Moreover, this paper presents a new drift diversity measure in order to study the diversity of base classifiers and see how they cope with concept drifts. From various experiments, this new measure has provided a clearer vision about the ensemble’s behavior when dealing with concept drifts.
How to Cite
classification, Drift detection and monitoring, Non-stationary environments
Baena-García, M., Campo-Avila, J. D., Fidalgo, R., Bifet, A., Gavaldà, R., & Morales-Bueno, R. (2006). Early drift detection method. In Proceedings of the Fourth International Workshop on Knowledge Discovery from DataStreams, Berlin, Germany, pp. 77-86.
Bifet, A., & Gavald, R.(2007) Learning from time-changing data with adaptive windowing. In Proceeding of 7th International Conference on Data Mining, Minnesota, USA, pp. 443-448.
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., & Gavalda, R,. (2009). New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, pp 139-148.
Bifet, A., Holmes, G., & Pfahringer, B.,(2010) Leveraging bagging for evolving data streams machine learning and knowledge discovery in databases. In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Barcelona, Spain, pp 135-150.
Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B.(2010). MOA: Massive Online Analysis. Journal of Machine Learning Research, vol. 11, pp. 1601-1604.
Brzezinski, D., & Stefanowski, J.,(2011). Accuracy Updated Ensemble for Data Streams with Concept Drift. In Proceedings of the 6th international conference on Hybrid artificial intelligent systems, Wroclaw, Poland, pp 155-163.
Cunningham, P., & Carney, J., (2000). Diversity versus Quality in Classification Ensembles based on Feature Selection. In Proceedings 11th European Conference on Machine Learning. Barcelona, Spain, pp. 109-116.
Domingos, P., & Hulten G.,(2000). Mining high-speed data streams. In the Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, pp. 71-80.
Gama, J., Medas, P., Castillo, G., & Rodrigues. P.,(2006) Learning with local drift detection. In Proceedings of the Second International Conference on Advanced Data Mining and Applications, Xi’an, China. pp. 42-55.
Gama, J., Sebastião, R., & Rodrigues, P.,(2009). Issues in evaluation of stream learning algorithms. In the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, pp. 329–338.
Harries, M.,(1999). Splice-2 comparative evaluation: Electricity pricing. Technical Report, The University of South Wales, Autralia.
Hulten, G., Spencer, L., & Domingos, P.,(2001). Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD international conference on Knowledge Discovery and Data Mining, California, USA, pp. 97-106.
Ikonomovska, E., Gama, J., Sebastio, R., & Gjorgjevik, D.,(2009). Regression trees from data streams with drift detection. In Proceedings of the 12th International Conference on Discovery Science, Berlin, Germany, pp. 121–135.
Klinkenberg, R. (2001). Learning drifting concepts: example selection vs. example weighting. Intelligent Data Analysis, vol. 8 , pp. 281–300.
Kolter, J., & Maloof, M., (2007). Dynamic weighted majority: a new ensemble method for tracking concept drift. The Journal of Machine Learning Research, vol. 8. pp. 2755-2790.
Kuncheva, L., (2004). Classifier Ensembles for Changing Environments. In Proceedings of the 5th International Workshop on Multiple Classifier Systems, Cagliari, Italy, pp. 1-15.
Kuncheva, L. I., & Whitaker, C. J, (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Journal of Machine Learning, vol. 51, pp. 181–207.
Lazarescu, M., Venkateshand, S.,& Bui, H., (2004). Using multiple windows to track concept drift. Intelligent data analysis, vol. 8, pp. 29-59.
Lughofer, E., & Angelov, P.,(2011). Handling Drifts and Shifts in On-Line Data Streams with Evolving Fuzzy Systems. Applied Soft Computing, vol. 11, pp. 2057- 2068.
Masud, M., Gao, J., Khan, L., Han, J., & Thuraisingham, B., (2011). Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions
on Knowledge and Data Engineering, vol. 23, pp. 859–874.
Mitchell, T,(1997). Machine Learning. McGraw Hill, New York, USA.
Minku, L., White, A., & Yao, X.,(2010). The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 730–742.
Oza, N., & Russell, S., (2001). Online bagging and boosting.In Proceedings of the Eighth International Workshop of Artificial Intelligence and Statistics, Florida,USA, pp. 105-112.
Sayed-Mouchaweh, M., (2010). Semi-supervised classification method for dynamic applications. Fuzzy Sets and Systems, vol. 4, pp. 544–563.
Schlimmer, J. C., & Granger, R. H. (1986). Incremental learning from noisy data. Journal of Machine Learning, vol. 3, pp. 317-354.
Sobhani, P., & Beigy, H., (2011). New drift detection method for data streams. In Proceedings of the second international conference on Adaptive and intelligent systems, Berlin, Germany, pp. 88-97.
Tsymbal, A., (2004). The problem of concept drift: definitions and related work. Technical Report TCD-CS- 2004-15, Trinity College, Dublin, Ireland.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.