A hybrid model for short term real-time electricity price forecasting in smart grid

Background: With the prominent growth of power market, real-time electricity price has become a trend in smart grid as it enables moderation of power consumption of customers. Accurate forecast of real-time price (RTP) has much influence on customers’ behaviors, such as better scheduling operating time of domestic appliances in order to maximize benefit. In this paper, an innovative hybrid RTP forecasting model considering linear and non-linear behaviors within input data, is proposed to forecast the short-term electricity prices in smart grid. Results: The effectiveness of the proposed hybrid forecasting model is verified by numerical results in terms of forecasting performance evaluations. The results clearly demonstrate that our approach is effective in RTP forecasting with a high accuracy. The mean absolute percentage error (MAPE) is approximate to 3.5% and it also significantly outperforms the existing models. Conclusion: Based on the achieved results, we can conclude that the proposed hybrid model is an accurate and efficient tool in short-term RTP forecasting and it is potentially effective to a variety of forecasting tasks.


Background
Real-time price (RTP), also referred to as dynamic tariff or spot price which was first introduced in the 1980s [1], nowadays is tentatively applied to the power system in many countries including the US, Australia, etc. The real-time price tariff is an inexorable trend in next generation of power system reforming [2,3]. Unlike regulated markets which the companies determine prices independently, electricity prices are significantly dependent on a supply-demand relationship in a deregulated market. Generally, RTP offers higher prices during peak load demand periods and provides lower prices during off-peak load demand periods [4,5]. In consideration of the manufacturing cost in different load levels, the dynamic tariff is a potential load management method for properly allocating incremental prices of electricity consumption to the time delivery, thus ensuring the overall economic rationality [6].
In addition, RTP tariff is broadly utilized as a basic control signal to support the demand response management (DRM) which is an excellent long-term solution to improving energy efficiency and reducing wastage [7,8]. On the one hand, RTP tariff is benefit to power grid as it offers specific price instructions for participants to average the power usage at different time so that alleviates the load burden of power grid especially in peak demand time. On the other hand, such an electricity tariff encourages consumption by price reduction during periods of abundance and allows customers to have multiple choices to determine the time of electricity consumption. The participants in electricity market can regulate the operating time of electrical devices automatically or manually during high-price periods and gain the benefits from low-price periods via DRM, thus achieving the aims of reducing energy usage and saving electric bills for themselves [5,[9][10][11]. Therefore, the research on RTP tariff is of interest to researchers, production companies, investors, independent market operators and large industrial consumers in recent years [12,13].
Moreover, the real-time price is normally provided with the instantaneous property. Thus, it is a necessity to forecast RTP in advance in this competitive electricity market for electricity consumers and power suppliers in scheduling their operations and controlling the price risks. Over last two decades, much research has been conducted on RTP forecasting. In summary, the existing methods can be classified into two main categories: machine learning based methods like SVM (Support Vector Machine) and ANN (artificial neural network) [14][15][16], and statistical time series based methods like ARIMA (auto regressive integrated moving average) model and GARCH (generalized auto regressive conditional heteroskedasticity) model [17,18].
Specifically, in [15], the authors proposed methods including hybrid networks of self-organized map (SOM) and support-vector machine (SVM) to predict short-term electricity price. With the trained network, one can predict the future hourly electricity prices in one day ahead. To confirm its feasibility, the proposed model had been trained and tested on the data of historical energy prices from the New England electricity market. In addition, in [16], a sensitivity analysis of similar days (SD) parameters to rise the accuracy of ANN model and SD-based short-term price forecasting model were presented. In order to train the network, a large sum of data were used. The model had been tested in Pennsylvania-New Jersey-Maryland (PJM) electricity market. The results showed that the mean absolute percentage error (MAPE) was around 11%. Furthermore, in [17], the authors introduced a method to predict next-day electricity prices based on the ARIMA methodology which was used to analyze the time series problem. The ARIMA model was tested in California electricity market. More than 30-day historical data samples were required to train the model.
However, the shared limitation of the mentioned studies above is that a large number of historical RTP data is required for training the model. The insufficient training data causes considerable estimation errors. Hence, our research in the paper mainly concentrates on building an effective estimation model for electricity price forecasting in smart grid with high accuracy by using limited sets of historical data. In order to evaluate the performance of methods, numerical error measures such as mean absolute error (MAE), means square error (MSE), root-mean square error (RMSE) and mean absolute percentage error (MAPE) are also used in this work.
The main contributions of this work can be summarized as follows.
(1) A hybrid RTP forecasting model which is a consolidation of least-square (LS) fitting model, grey prediction (GP) model and artificial neural network (ANN), is proposed. The LS fitting model considers the linear behavior of the time series data and the GP model considers the non-linear behavior. However, the ANN model is an optional forecasting procedure and used in the error optimization. (2) Less historical RTP data is required, thereby improving the practicability. Since both LS and GP models can be established on the basis of a small number of data sets, the proposed hybrid forecasting model is easy to install and more practical compared with the previous methods. The results indicate that our method is an accurate and efficient tool to forecast the day-ahead RTP and it also significantly outperforms the previous methods.
To the best of our knowledge, this is the first work of combining above pieces together in RTP forecasting.

Method
This section introduces the methodology which includes the architecture of the proposed forecast strategy and the specific description of the proposed hybrid forecasting model in this work.

Architecture of the proposed forecasting strategy
Considering time scales, the RTP forecasting is classified into ultra-short term, short term, medium term and long term [19]. The Ultra-short term is from several minutes to 1 h ahead forecasting. The short term means the forecasting values from 1 h to several hours. From a few hours to 1 week ahead forecasting is defined as the medium term and beyond that it is the long term forecasting. However, we focus on the day (24 h) ahead RTP forecasting with a resolution of 0.5 h in this work, which belongs to the short term forecasting. Figure 1 shows the historical RTP data over 5 historical day samples which is provided by Australia Energy Market Operator (AEMO) [20]. The time series dynamic electricity prices vary dependent on load demand at different time periods. Based on the variations of historical RTP samples in Fig. 1, electricity prices exhibit a prominent regularity apparently and it consists of linear and non-linear information along with the prices varying. According to these, the characteristics of the linear and non-liner properties of time series data have to be incorporated into the forecasting model. Therefore, the proposed forecasting model can be formulated as: where P t is the forecasting RTP at time t. L t and N t represent the estimations of linear behavior and non-linear behavior, respectively, of the input data. Additionally, E * t which is an optional forecasting component, denotes the error optimization procedure.
In order to present the architecture of the proposed hybrid forecasting model, Fig. 2 illustrates the flow chart of forecasting day-ahead real-time electricity prices based on several days' historical RTP data. Specifically, the historical data is input as the basis to establish the model. Then, the linear behavior of the data are estimated by using the LS fitting model. Afterwards, the GP model is applied in the estimation of the non-linear behavior within the data. After that, the ANN model based error optimization procedure will be determined if it is necessary to be executed on this stage in accordance with the spot error rate (ER) of the initial forecasting result. The ANN model will be executed to improve the specific forecasting accuracy if the spot ER exceeds the maximum tolerable ER. Finally, the forecasted day-ahead RTP coming from the integrated model is the desired output in this study.
In next subsections, the specific descriptions of the relevant forecasting components in the hybrid model are introduced in details.

Least squares fitting model for linear behavior forecasting
Based on Eq. (1), the LS fitting model is employed to obtain the linear behavior L t within the input data. The least square fitting for data is a standard approach in regression analysis to the approximate solution of over determined systems. It is one of the fitting algorithms [21][22][23][24]. On the stage of linear behavior forecasting, the LS fitting model can Fig. 2 The flow chart of forecasting day-ahead real-time electricity prices be used to build a fitting function to express the main stream variation among the historical data. Assume the input data set H consisting of n = N days' historical RTP data, H can be formulated as: However, the historical RTP of a day can be treated as a number of discrete values with an interval. In this study, a time interval of 0.5 h is adopted, which means t = 48 fixed values are included in an individual day sample. Hence, D n is represented as: In addition, the fitting function L(t) is taken to model the main stream variation in the linear behavior estimation. However, the general formats of the fitting function include Fourier, Gaussian, polynomial, sum of sine, etc. and they can be formulated in Eqs. (4) - (7), respectively, as follows.
Gaussian format: Fourier format: Polynomial format: Sum of sine format: where d ∈ N + is the degree of the adopted function. Additionally, a i , b i , c i , ω i and p i are undetermined constant parameters in model. Although all the proposed fitting function formats are effective in modeling the linear behavior within the data, the Fourier format is adopted in this study due to its better fitting performance. Therefore, the objective function on this stage can be formulated as determining a group of appropriate parameters (a i , b i and ω i ) to minimize the total square errors J. The objective function is presented in Eq. (8).
arg min On the one side, a higher value of the fitting degree d leads to a better performance of the estimation when J is in a reasonable range. On the other hand, it results more complexity of the calculation and more CPU wastage. Therefore, selecting an appropriate fitting degree in the fitting model is significant and may lead to a better linear behavior estimation performance. Table 1 presents the total square errors J with different values of fitting degree which range from 1 to 7. Apparently, when d ∈ [1,3], J decreases quickly with the fitting degree increasing. However, J becomes stable when d > 3. For example, there is only |J d=5 − J d=6 | = |2.995 − 2.981| = 0.015 differences between the cases d = 5 and d = 6 based on the obtained results in Table 1. Although different fitting degree values can work in the linear behavior estimation on this stage, considering the estimation accuracy and efficiency, and avoiding the over fitting, d = 4 is taken as a proper fitting degree in this study. Afterwards, the relevant parameters can be determined as in Table 2.

Grey prediction model for non-linear behavior forecasting
The second stage in the proposed hybrid RTP forecasting model is to estimate N t which denotes the non-linear behavior within the input data. Obviously, the non-linear information within the data is included in the forecasting errors after using the LS fitting model. Thus, the non-linear behavior within the historical data at time t can be expressed in Eq. (9) and the initial records are used to estimate the next record (D n+1,t − L t ) by using the GP model.
The GP model or GM(1,1) was first proposed to deal with the data in grey system. It is able to analyze system that includes insufficient information and unapparent relationship [25][26][27]. Hence, the GP model is often used in predicting data in non-linear system based on limited information. It transforms the forms of the irregular discrete sequences and displays the potential regularities within the sequences. Transforming the forms of the sequences can make the properties of stochastic and randomness get weaker thereby turning irregular sequences to regular ones [28][29][30]. Since only a few non-linear data proceeded from LS fitting model are used, it is quite appropriate to employ the GP model to estimate the non-linear behavior within the input data on this stage. The GP model is established by using generalized series data. The primitive sequence data is defined as X (0) and it can be presented as: where x (0) (n) = D n,t − L t and x (0) (n) 0. However, if any x (0) (n) < 0 in the primitive sequence, all the candidates in the sequence have to be improved until ∀x (0) (n) 0. Afterwards, the first accumulated generating data X (1) can be obtained in Eq. (11).
where In addition, Z (1) which is determined by X (1) , is defined as the background factors and Z (1) can be presented as: where For example, when t = 26 in our case, the initial sequence (after data preprocessing) is X (0) = {6.5, 7.9, 11.1, 11.4, 13.5} as shown in Fig. 3a. Then, the first accumulated generating data X (1) can be calculated as X (1) = {6.5, 14.4, 25.5, 36.9, 50.4} as shown in Fig. 3b. Obviously, there are not any prominent regularities between the numbers in X (0) . However, after first accumulated generating operation (AGO) towards to X (0) , the new sequence X (1) is provided with the quasi exponential property (otherwise, the second AGO will be executed towards to X (1) ). Therefore, X (1) can be regarded as being satisfied with the first order ordinary differential equation which is shown in (15). Fig. 3 Examples of X (0) and X (1) . Case t = 26 is adopted as an example. a Example data in X (0) ; b Example data in X (1) where a is treated as a development coefficient that describes the increasing speed of numbers in X (0) and u is an endogenous control coefficient in system. The parameters U =[a, u] T can be determined in Eq. (16).
where Y is a (n − 1) × 1 matrix and B is a (n − 1) × 2 matrix. Y and B can be presented as: . . .
According to these, Eq. (15) can be resolved by using the obtained parameters a and u, so that the forecasting formula ofX (1) can be denoted as shown in Eq. (17).
According to Eq. (18), when n = N,x (0) (n + 1) is the objective forecasting value. After that, the GP model can be utilized periodically to achieve forecasting values on all time spot t. Furthermore, the forecasting result is the combination of the linear behavior and the non-linear behavior within the input data, which can be described in Fig. 4. In general, the variation of the obtained RTP by using LS model+GP model is in line with actual RTP in an overview. The result indicates that the error rates are lower than 10% at most of  5%). Nonetheless, the error rates are higher than 10% during time period 6:30 -8:30, as unexpected. These unexpected errors may be caused by the defects of the current forecasting models. Since limited data sets were used for improving the practicality in the current models, the random error increases when there are great differences among the input data sets. Therefore, in order to improve the forecasting accuracy at specific time slots based on the initial forecasting result, the ANN model based error optimization procedure is required.

Artificial neural network for error optimization
The artificial neural network (ANN) is also a non-linear modeling where any prior knowledge of relationship between input and output is needed [31]. It gives great results for forecasting problems [16]. To establish the model, only sufficient data is required to assimilate the connection between inputs and outputs. The main parameters of ANN model are the number of the input vectors, the number of layers and the number of neurons in each layer [32][33][34]. However, the large and sudden spikes in the input data will lead to less accuracy in the output using ANN. In this study, the back propagation (BP) algorithm is utilized to train the ANN model. Figure 5 shows the topography of a typical 3-layer back propagation neural network [35,36]. A 3-layer back propagation neural network is a typical multiple-layer network and it includes input layer (LA), hidden layer (LB), and output layer (LC). There are no connections between nodes that belong to the same layer. LA has m nodes that correspond to the m inputs of the network. LC consists of n nodes that correspond to the n output of the network. The node number of LB can be varied to fit the task.
Define W ir as the connection weight between node a i of the LA layer and node b r of the LB layer. Similarly, let V rj be the connection weight between node b r of the LB layer and node c j of the LC layer. Set T r and θ j as the bias of node b r of the LB layer and the bias of node c j of the LC layer, respectively. Then the output function of the LB layer node The output function of the LC layer node should be: where f (x) is a sigmoid function and it can be expressed as: In addition, the BP learning algorithm which is a typical error-revised learning algorithm is used to learn and store knowledge in the 3-layer back propagation neural networks. The learning procedures can be illustrated as follows.
(1) Initialize the variables W ir , T r , V rj and θ j with small random values.
• Input the values of A (k) at layer LA, then calculate b r and c j by Eqs. (19) and (20). • Calculate the bias d j of the desired value and calculate the value c j of the layer LC nodes and let • Back propagate the errors to the layer LB nodes and let • Adjust the connection weights V rj and the bias of the layer LC nodes θ j : where V rj and θ j are the adjusting values of the previous learning loop. α is the learning ratio and 0 < α < 1. β is the momentum factor. • Adjust the connection weights W ir and the bias of the layer LB nodes T r : T r = T r + α · e r + β · T r (27) where W ir and T r are the adjusting values of the previous learning loop.
In accordance with the analysis in previous sections, the ANN model is used on this stage to improve the accuracy of the RTP forecasting further in particular time slots, such as between 6:30 -8:30 as shown in Fig. 4. In this case, 2 hidden layers with 20 and 40 neurons are designed and 10-day historical data is adopted. In next section, a number of simulations are carried out to prove the effectiveness of the proposed hybrid model and the forecasting quality is also evaluated in terms of some evaluation criteria.

Results
This section demonstrates the real-time electricity prices forecasting results by using the proposed hybrid forecasting model. Limited data sets (5 days) of the historical RTP with a time interval of 0.5 h in Australia is adopted. The achieved results are also compared with the previous methods (e.g., ARIMA model, independent BP-ANN model, etc.) in this work.
In addition, a number of evaluation criteria including MAE, MSE, RMSE and MAPE [37][38][39] Figure 6 shows the RTP forecasting results comparison between the proposed hybrid model and some typical models. According to the obtained results, it is obvious that all three compared models, i.e., the hybrid model, the ARIMA model and the BP-ANN model are able to accomplish the task of forecasting RTP in advance. The forecasting price variations are in line with the observed actual RTP in general. Additionally, the forecasting RTP by the hybrid model is slightly better than the other two models on the basis of the results. However, comparing the RTP variations in Figs. 4 and 6, it is apparent that the forecasting errors are significant reduced at the time period 6:30 -8:30, due to the contribution of the BP-ANN model in the error optimization procedure.
In addition to these, the RTP forecasting quality evaluation comparison between the hybrid model and different models in state-of-the-arts are also presented in Table 3. Based on the achieved results, the proposed hybrid model performs best in forecasting quality evaluation in an overview, which confirms the advantages of our approach. Specifically, the MAE, MSE, RMSE and MAPE of the hybrid model are 1.06, 1.72, 1.31 and 3.38%, respectively, which are the lowest among all models. However, the ARIMA model performs the worst in the MAE evaluation and the independent LS model performs worst in MSE, RMSE and MAPE evaluations.

Discussion
The hybrid model analyzes the input data in views of linear behavior, non-linear behavior and errors optimization within the data. The advantage of our approach is that the hybrid model is more robust in dealing with forecasting tasks based on insufficient data compared with the traditional models such as ARIMA which needs a large number of historical data for training. The RTP forecasting quality evaluation results in Table 3 also indicate that the ARIMA model is not effective in the case with limited input data. In addition, the individual LS model did not perform well in this case, as the LS model only extracts the main stream within the input data. Therefore, using a LS model independently to forecast the RTP will lead to considerable errors as expected. It is much more interesting to see that the LS model cooperated with the GP model performs a bit worse than the independent GP model in overall evaluation. This is because the combined model (LS model + GP model) perform not well in a specific time period, i.e., 6:30 to 8:30 in this case, so that the errors improved significantly in overall, although it has higher forecasting accuracies in other time periods compared with the GP model. However, given a group of historical data, there are several forecasting models can be used and each model may be able to complete the task of forecasting with different accuracies. After a great number of tests, we realize that the forecasting performance is crucially dependent on both selecting an appropriate model and the data correlations. A forecasting model works well to one group of data, but can be not effective for another group of data. Therefore, a hybrid forecasting model is generally more efficient than an independent model. The best performance metrics are marked in bold