Recurrent Neural Networks: Time-series Forecast

Introduction: Time-series Prediction

This project focuses on applying the feedforward neural networks to typical machine learning tasks, such as time-series prediction. For time-series prediction, we first investigate one benchmark data set, namely, the Santa Fe dataset. Then, we analyze a global temperature data set.

A time series is a sequence of observations, ordered in time. Forecasting involves training a model on historical data and using them to predict future observations. A simple example is a linear auto-regressive model. The linear auto-regressive (AR) model of a time-series \(Z_t\) with \(t = 1, 2, ..., \infty\) is given by:

\(\hat{z_t} = a_1 z_{t-1} + a_2 z_{t-2}+...+a_p z_{t-p}\)

with \(a_i \in \mathbf{R}\) for \(t = 1, 2, ...,p\) and \(p\) p being the model lag. The prediction for a certain time t is equal to a weighted sum of the previous values up to a certain lag \(p\). At the same time, the nonlinear variant (NAR) is described as:

\(\hat{z_t} = f(z_{t-1}, z_{t-2}, z_{t-1},..., z_{t-p})\)

Santa Fe Dataset

The Santa Fe dataset is obtained from a chaotic laser which can be described as a nonlinear dynamical system. Given are 1000 training data points. The aim is to predict the next 100 points (it is forbidden to include these points in the training set!). The training data stored in lasertrain.dat and the test data contained in laserpred.dat are shown in Figure 1.

Figure 1: Training and test set

Energy Consumption Forecast

A grid search was performed to assess which hyperparameters yield the best performance in terms of Mean Squared Error. The grid search considered combinations of training algorithms, lag values ranging from 15 up to 70, and architectures with different number of layers and hidden units. The best solution for a single layer was with a lag of 20 and 15 neurons, resulting in an MSE of 701.12. Figure 2.1.1 displays the prediction of the network over the test set. The overall best solution corresponds to a network with 30 neurons in the first layer and 20 in the second, with a lag of 45. The approximation to the test values is adequate as can be seen in Figure 2.1.2. with an MSE of 224. Both networks were train with the Levenberg-Marquardt optimization algorithm and 50 epochs.

Figure 2.1: Predicted forecast with 1 layer

Figure 2.2: Predicted forecast with 2 layers

Figure 3: Visualization of the training, validation, and test sets of the Santa Fe forecast data

The most common way to construct a validation set is to sample a subset within the training set or to cross validate with a certain number of folds. This approach is not suitable for time-series given that sub-setting random data points for validation might result in a situation where the validation data points are from earlier in the time series than the training set, hence during validation we would be predicting the past with future data, which does not represent the situation we face in practice. Furthermore, data points in a time series are dependent, hence it is not reasonable to predict a random point with an ANN trained from an ordered series. Instead, a validation set for a time series needs to be sampled as the most recent sequence in the training time series. Nevertheless, this approach might not be fruitful if the validation portion of our time series has a particular pattern which is not representative of the whole series, in which case the optimal parameters will be selected to minimize the error for the non-representative validation pattern, ending up with a poor performance for test data and forthcoming data. Considering the data structure depicted in Figure 2.1.3, we notice that the validation portion cannot consider all the timeseries nuances e.g., the final section of the test set, approximately Date 1050 onwards is completely different with respect to the validation portion of the time series.

After conducting the same grid search as before, but using the continuous validation set this time as in Figure 2.1.3, the accuracy of the network decreased which supports the idea that performance won’t be enhanced if the validation set is not representative of the complete timeseries.

Climate Change Data Set

Climate change is one of the biggest threats of our age. In this section, we work on a global climate change prediction data set using timeseries methods. Specifically, we will use the non-linear auto-regression (NAR) for this case. The data can be downloaded from [1]. Note that this dataset is repackaged from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. In this dataset, we only use the global land temperature in GlobalLandTemperaturesByCity.csv, which includes:

• Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures;

• LandAverageTemperature: global average land temperature in celsius;

The historical average temperatures for Aberdeen and Abu Dhabi were select from 1850 up to 2015. After conducting a grid search (Table 1 and Table 2), for a set of combination of number of hidden units and lag sizes, the performance over the test set in both cities is adequate. Figures 2.1. and 2.2 display forecasts that are accurate with respect to the test set. Each combination was conducted 5 times and the found MSE were averaged to attain a more stable solution. A closer grid search, that iterates with less gap through the combination of hyper-parameters that already displayed good solutions in the original search could lead to find a better solution. Testing other optimization algorithms might also increase the forecast accuracy.

Figure 2.1:Temperature Forecast in Aberdeen

Figure 2.2: Temperature Forecast in Abu Dhabi

Table 1: MSE Grid Search Abu Dhabi
	Hidden Units
Lag	30	50	70
25	3151	1086	729
35	335	831	506
45	50	171	327
55	121	151	358
65	47	178	214
75	78	175	192
85	94	114	125
95	77	69	61
105	130	102	51
115	39	13	8
125	28	19	14

Table 2: MSE Grid Search Aberdeen
	Hidden Units
Lag	30	50	70
25	2964	5060	4876
35	4622	3123	2460
45	1577	2110	2031
55	1702	1436	1055
65	1095	784	661
75	595	604	466
85	641	367	437
95	299	222	270
105	322	45	78
115	76	109	9
125	16	34	21

By looking at the structure of the two cities’ time series, we notice that they share the same pattern except by the range in which they oscillate, which in the case of Abu Dhabi is given by higher temperatures. For Abu Dhabi and Aberdeen, we obtained the same optimal hyper-parameters, and their respective performance is extremely similar. This similarity suggests that the tuned hyper-parameters of one city might be accurate to forecast the temperature of other cities if they share a similar pattern throughout the time series.