An LSTM-Based Online Prediction Method for Building Electric Load During COVID-19

Accurate prediction of electric load is critical to optimally controlling and operating buildings. It provides the opportunity to reduce building energy consumption and to implement advanced functionalities such as demand response in the context of the Smart Grid. However, buildings are nonstationary and it is important to consider the underlying concept changes that will affect the load pattern. In this paper we present an online learning method for predicting building electric load during concept changes such as COVID-19. The proposed methods is based on online Long Short-Term Memory (LSTM) recurrent neural network. To speed up the learning process during concept changes and improve prediction accuracy, an ensemble of multiple models with different learning rates is used. The learning rates are updated in realtime to best adapt to the new concept while maintaining the learned information for the prediction.


INTRODUCTION
Buildings consume a large portion of electricity today. Accurate predicting of building electric load can help the building management system to schedule the building operation more efficiently. With the advance of the Smart Grid concept (Tu et al., 2020), buildings can interact with other grid components and become an active player that can help support grid operation such as by providing frequency regulation service (Goddard, Klose, & Backhaus, 2014;Beil, Hiskens, & Backhaus, 2016). If a building is equipped with photovoltaic generation systems and battery energy storage systems, it can inject power back into the grid when the energy price is high and recharge the battery storage when the energy cost is low Hao Tu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
(Y. Wang, Wang, Chu, Pota, & Gadh, 2016). Some other researchers also look at building in microgrids (Guan, Xu, & Jia, 2010) and study energy management strategies of maintaining self-sufficient operation when the main grid is disconnected from the building microgrid. An accurate predicting method of building electric load is critical to realizing all the advanced functionalities.
Several building modeling methods have been proposed in literature to predict buildings' energy use. They can be briefly categorized in white-box, grey-box, and black-box approaches (Li & Wen, 2014). The white-box approaches run detailed physical models of the buildings and predicts the energy use based on the underlying law of physics. However, this approach requires the properties of the buildings such as geometric shape and wall materials as the input and they are generally hard to obtain. The grey-box approach uses simplified models to simulate the energy behavior buildings. For example, an resistor-capacitor network can be used to model the building's thermal behavior (S. Wang & Xu, 2006). For grey-box approach, the parameter identification for the simplified models is critical to the prediction accuracy and requires careful calibration. The black-box approach uses datadriven methods to make prediction. Instead of constructing physical models of buildings, historical data are used to find the energy consumption patterns and make prediction based on the recognized patterns. For example, an auto-regressive model (Yun, Luck, Mago, & Cho, 2012) with exogenous inputs such as temperature, solar irradiance and wind speed is proposed. Recently, machine learning based methods are widely studied to predict the building energy use. For example, a radial basis function neural network (Mai, Chung, Wu, & Huang, 2014) or Long Short-Term Memory (LSTM) Network (Marino, Amarasinghe, & Manic, 2016) can be used to predict the building energy demand.
One major challenge associated with the machine learning based methods is concept change. Concept change refers to the "changes in the conditional distribution of output given the input" (Gama,Žliobaitė, Bifet, Pechenizkiy, & Bouchachia, 2014). More specifically, the machine learning based methods predicts the future energy consumption using historical data. This assumes the future energy use will follow the same pattern learned from the historical data. However, this assumption may not be always valid. The building energy use pattern may varies in time due to different reasons. When the learned pattern does not reflect the current energy use pattern, the prediction made can be erroneous.
One way of dealing concept change is by employing online learning which retrains the machine learning model with new incoming data in every time step. Online learning has the capability of adapting to the new concept and its adaptation speed have direct impact on the prediction accuracy. One method called Dynamic Weighted Majority (DWM) was proposed to speed up adaptation and improve prediction accuracy (Kolter & Maloof, 2007). DWM maintains an ensemble of multiple models and the final prediction of the ensemble is the weighted average of all models. The weight of a particular model depends on its history of prediction accuracy. Another approach of adjusting the adaption speed is to use an adaptive learning rate in online training (Guo et al., 2016;Yang, Pan, & Tao, 2017). Both papers aims at reducing the adverse effect of outliers by tuning down the learning rate when an outlier is detected. An online learning method is proposed for anomaly detection and concept change using multi-step prediction (Saurav et al., 2018). The proposed method with local normalization can quickly adapt to certain types of changes such as mean value and frequency but fail to address concept changes in general.
This paper proposes an LSTM-based online prediction method. The proposed method maintains an ensemble of multiple models with different learning rates. The learning rates are adaptive depending on the prediction accuracy of the models in the ensemble. The proposed method is tested on a building electric load dataset during COVID-19, showing that it can quickly adapt to new concepts and make predictions with improved accuracy.

Long Short-Term Memory and Single-Step Prediction
LSTM is a type of recurrent neural network that is able to learn long-term dependency from data and it is particularly suitable for applications such as time series prediction. More detailed information about LSTM can be found in (Hochreiter & Schmidhuber, 1997). Input to LSTM can be a historical the trainable parameters of the LSTM cell; σ is the sigmoid activation; and * denotes element-wise multiplication. The input for the tth (t ≤ k) recurrent step includes external input x t−1 and the carry state C t−1 and hidden state h t−1 from the previous recurrent step as shown in Figure 1. The update of carry state C t is governed by input gate i t and forget gate f t . The output (and also its hidden state) h t of the recurrent step is calculated based on carry state C t and output gate o t . The hidden states h t (1 ≤ t ≤ k) can be used as the input to the next LSTM layer or to calculate the final output of the model.
For single-step time series prediction, the historical sequence {x k , x k−1 , ..., x k−l } are used to make prediction for the next time stepỹ k+1 . After the LSTM layers, a dense layer is added to make the target prediction based on the output of the last LSTM recurrent step. Thus, the final prediction is given as, Where W d , b d are the trainable parameters of the dense layer.

LSTM for Building Electric Load Prediction
For the application of building load prediction, while the target predictionỹ k+1 can be easily identified as the electric load level for the next time step, several features can be identified as input to the LSTM model.
1. As a significant portion of the electric load is consumed by the building's heating, ventilation, and air conditioning (HVAC) systems, the building electric load is closely related to the outside air temperature (OAT), T oat . Thus, the current OAT is used as an input feature.
2. The building electric load varies with the time of the day. Generally, the building load peak is around 2 PM while the load during night is very low. This feature is a vector H ∈ {0, 1} 24 , presenting the hour index of the day using one hot encoding. For example, 2 AM will be encoded as a 24-dimensional vector whose third element is 1 and all the other elements are zero.
3. The building electric load varies with the day of the week. For example, the load during weekdays is generally higher than weekends. This feature is a vector D ∈ {0, 1} 7 , presenting the day index of the week using one hot encoding. For example, a Monday will be encoded as a 7-dimensional vector whose first element is 1 and all the other elements are zero.
4. The building electric load is significantly lower in holidays than that in work days. A binary feature G is used to denote whether it is a holiday or not, i.e. G = 1 if it is a holiday and G = 0 otherwise. 5. The above features capture some critical information that can affect the building electric load. However, other information like the occupancy of the building, solar irradiance and wind speed is difficult to obtain and thus difficult to be included in the features explicitly, although they can influence the building power consumption. To include their influence, the current electric load y k is used as input feature to predict the electric load for the next time stepỹ k+1 .
For each time step k, all the features for that time step are concatenated into one feature vector x k = {y k , T oat,k , H k , D k , G k } ∈ R 34 . The LSTM layer takes a sequence of the feature vectors as input. The length of the sequence is selected to be 24 in this paper assuming that the load for the next hour can be predicted based on the information from the last 24 hours. Thus, the input to the LSTM is sequence {x k , x k−1 , ..., x k−23 } and the output is the predicted electric load for the next time stepỹ k+1 .

ONLINE LEARNING
while (y k+1 is not available) do 7: Wait 8: 9: for i = 1 → 3 do 10: IdxBestLearner ← index for lowest error err ( The online learning process for LSTM is illustrated as follow. For each time step k, the model makes a predictioñ y k+1 based on the input {x k , x k−1 , ..., x k−23 } and model When the ground truth y k+1 comes at time step k + 1, a loss function is used to evaluate the error between the predictionỹ k+1 and actual value y k+1 . The parameters W k is then updated based on gradient descent method, Where loss is the selected loss function and α is the learning rate. With incoming ground truth data y k+1 , the feature vector for time step k + 1, x k+1 = {y k+1 , T oat,k+1 , H k+1 , D k+1 , G k+1 }, and the new input sequence, {x k+1 , x k , ..., x k−22 }, can be constructed. The new input sequence is fed to the updated model with parameters W k+1 to make the predictionỹ k+2 for the next time step.

Online Learning with Adaptive Learning Rate
Learning rate α is critical to online learning algorithms. On one hand, a small learning rate results in a low learning speed thus requires more time steps to adapt to the new concept. The predictions during the adaption can suffer from poor accuracy. On the other hand, if the learning rate is too large, the network weights are changed significantly during each time step. Since the measurement of each time step may contain noises, the noises propagate to the network weights through  As it is difficult to determine the best learning rate beforehand, we propose to maintain three LSTM models (referred to as learners) with different learning rates α 1 < α 2 < α 3 in a model ensemble. The learner with smallest learning rate α 1 is referred to as slow learner while the one with the largest learning rate α 3 as fast learner. The learner in the middle is referred to as average learner. At each time step, each learner makes a prediction and the final prediction is the average of the predictions made by all learners.
When the ground truth comes, the prediction made by individual learner is compared with the sample. If the fast learner demonstrates the lowest prediction error in all three learners, the learning rates of all learners are increased by a preset value δα. If the slow learner has the lowest error, the learning rates of all learners are decreased by δα. If the average learner has the lowest error, the learning rates are unchanged for this time step. Then, the learners in the ensemble are replaced by the learner that has the lowest prediction error. This is done by removing the all learners except the one with the lowest error and duplicated it twice. As a result, all the learners are identical (and they are the learner with the lowest prediction error for the last time step). Finally, the learners are trained on the new sample with the updated learning rates and make predictions for the next time step. Again, the final prediction is the average prediction of all learners.
The predicting and updating procedure is summarized in Algorithm 1.

Data analysis
The electric load data of an office building in California, USA from 6.30.2019 to 6.17.2020 were collected. As shown in Figure 2, the data can be divided into three phases. From 6.30.2019 to 3.15.2020, it is the pre-COVID-19 phase when the building operation is normal and its electric load follows some regular pattern. From 3.16.2020 to 5.24.2020, it is the COVID-19 phase when the building is closed and only some essential departments were operating. The electric load drops significantly during the COVID-19 phase and its pattern differs from that of pre-COVID-19 phase. From 5.25.2020 to 6.17.2020, it is post-COVID-19 phase and the building is reopen. The building electric load increases during this phase. However, its pattern is not the same as the COVID-19 phase nor the pre-COVID-19 phase.
Based on above discussion, two concept changes are identified. The first one happens at 3.16.2020 when the building is closed. The second one happens at 5.25.2020 when the building is re-opened.

Baselines
Two baselines are compared with the proposed method.
1. Off-line LSTM (Off-LSTM): this is LSTM model without online training. In this case, the data from 6.30.2019 to 3.5.2020 are used for training. 2. Online LSTM with fixed learning rate (On-LSTM-FLR): this methods takes the trained off-line LSTM model and Similar to On-LSTM-FLR, the proposed online LSTM with adaptive learning rate, takes the trained off-line LSTM model and applied the updating and training rules proposed in section 3. The proposed method and two baselines use the the same network structure. To be specific, two LSTM layers are stacked. The first LSTM layer has 32 neurons while the second layer has 16 neurons. Recurrent dropout rate is set to 50% for both LSTM layers. A dropout layer with 20% dropout rate is inserted between the two LSTM layers. The initial learning rates for the proposed method are set to α 1 = 0.01, α 2 = 0.012, α 3 = 0.014. The learning rate increment is set to δα = 0.0002.
The performance metric used to evaluate different models is mean absolute error (MAE), Where N 0 is the starting time step and N is the number of predictions. As the proposed method in this paper aims at improving the prediction accuracy during concept change, N is selected to 168 (which corresponds to one-week long hourly data) to evaluate its performance change over time. The performance of the models is tested on fourteen weeks across three phases as shown in Figure 2. The results are summarized in Table 1. It is worth noting the results during 3.30.2020-5.3.2020 are omitted because of space limitations. The results during the omitted period show similar trend with the proposed method having the lowest MAE and Off-LSTM having the highest MAE.

Apr
The learning rate evolution of the proposed method from 3.6.2020 to 6.16.2020 is shown in Figure 6.

Pre-COVID-19 Phase
Week-long data from 3.9.2020 to 3.15.2020 were used to test the performance of the three models during pre-COVID-19 Phase. As shown in Figure 3, Off-LSTM and the proposed method give similar performance on this test set while On-LSTM-FLR performing worse than them. Since the load during this week still follows the pattern of the building's normal operation, Off-LSTM can predict the load consumption used the learned off-line model with the lowest MAE. For online algorithms (On-LSTM-FLR and proposed method), the models are updated with new samples in each time step. The random noises in the new samples can disturb the learned weights and negatively influence the prediction. As the proposed method uses an adaptive learning rate, the learning rate is reduced during this phase to minimize the negative influence of the noise. This test shows that, first, the offline LSTM model is properly trained to predict the building load when there is no concept change; second, the proposed method outperforms On-LSTM-FLR when there is no concept change.

COVID-19 Phase
During COVID-19, the building was closed thus its electric load was reduced significantly. The load pattern is also different from that of the pre-COVID-19 phase. The test results for two weeks during the COVID-19 phase are presented.
The results for the week from 3.16.2020 to 3.22.2020 is shown in Figure 4. This is the immediate week after the building was closed and corresponds to the first concept change.
The proposed method shows the lowest MAE among the three methods, demonstrating its capability of adapting to the new concept and rejecting the influences of noises. Off-LSTM outperforms On-LSTM-FLR but the difference in MAE is smaller than that during pre-COVID-19. This is maybe because the new concept (new electric load pattern) during COVID-19 phase has not been stabilized in the first week. As a result, the pattern in this week presents some similarity to that of the pre-COVID-19 phase.
The results for the week from 5.11.2020 to 5.17.2020 is shown in Figure 5. This is the ninth week after the building was closed. The proposed method shows the best performance while On-LSTM-FLR outperforms Off-LSTM for this week. In Figure 4a, the prediction made by Off-LSTM has peaks in the mornings. This clearly shows that Off-LSTM still tries to predict the load using the old concept which leads to the worst performance of the three methods.

Post-COVID-19 Phase
For post-COVID-19 phase, the building is re-open and its electric load increases. However, the load pattern is different from that of the pre-COVID-19 phase and that of the COVID-19 phase. The test results for two weeks during the post-COVID-19 phase are presented.
The results for the week from 5.25.2020 to 5.31.2020 is shown in Figure 7. This is the immediate week after the building was re-opened and corresponds to the second concept change. The proposed method shows the lowest MAE among the three methods. Off-LSTM has the largest MAE. All three methods fail to predict the load peak in the afternoon of May 25. Furthermore, they fail to predict the load peak at 23:00 of May 25 and 26. These new patterns have never been seen before.
The results for the week from 6.8.2020 to 6.14.2020 is shown in Figure 8. The proposed method shows the lowest MAE among the three methods.

CONCLUSION
This paper proposed an online LSTM-based method for predicting building electric load during concept changes such as COVID-19. The proposed method uses a model ensemble maintaining three learners with different learning rates. The learning rates are adjusted online to find the best learning speed for the concept change. Experiment results show that the proposed method can quickly adapt the model to the concept changes during COVID-19 and reduce the prediction errors.