Thursday, September 3, 2020

How to create an ACF correlogram using excel


As we have already studied in > this post, the autocorrelation function is a ratio of the covariance between the range of n values of time series separated by lag k to that of the covariance between range of n values, separated by k=0 that is the same series, which is also the variance of the sample data series.
Thus, the Autocorrelation function (ACF) at lag k≥0, is denoted as rk = sk/ s0
Where, sk is the auto covariance of the time series at lag k≥0, and is denoted as,

The variance of the time series can be denoted as r0
The plot of rk against k is known as a Correlogram.
Thus, the autocovariance as defined above is slightly different from the usual definition of covariance between two variable series {Y1, ... , Yn+k) and {Yk+1 , ……. , Yn). Firstly, we divide by the n, instead of n-k and we subtract the overall average instead of the averages of {Y1, ... , Yn+k) and {Yk+1 , ……. , Yn) respectively. For the values of n, which are greater than k, the difference will be small.
For calculation of sk and rk for an example range in the AAPL dataset in the autocorrelation blogpost  – 4, let’s observe the following snapshot of the excel screen:



The variance of the time series can be denoted as r0

The plot of rk against k is known as a Correlogram.
 


Thus, the autocovariance as defined above is slightly different from the usual definition of covariance between two variable series {Y1, ... , Yn+k) and {Yk+1 , ……. , Yn). Firstly, we divide by the n, instead of n-k and we subtract the overall average instead of the averages of {Y1, ... , Yn+k) and {Yk+1 , ……. , Yn) respectively. For the values of n, which are greater than k, the difference will be small. 


For calculation of sk and rk for an example range in the AAPL dataset in the autocorrelation blogpost  – 4, let’s observe the following snapshot of the excel screen:
 


Where, we want to calculate and plot the autocorrelation for the AAPL close time series for Lag k=1 to 10, where n=365 and the range of values for AAPL is c2 to c366 in excel file.  

In the above example, the values required for autocorrelation are calculated using the above formula for example for k=1, as: 



Here,
S1 = COVARIANCE.S($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1), OFFSET($C$2,D2,0,1,1):$C$366)

Wherein, we calculate the covariance of two series from the data namely one starting at the first data point in cell c2 to last but one data point in the series, which is in cell c365. In order to reference the two series for covariance, they are referenced as follows:
First series: ($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1)

Where, c2 is the starting data point and offset function returns the reference to the desired cell in the following format:
OFFSET (reference, rows, cols, [height],[width]),

which returns a reference to a range that is a given number of rows and columns from a given reference.

In the above example, OFFSET function counts the value from the reference cell C2, to a count of c2 to c365 rows less the number of lags, which in this case is D2=1, col=1, as the range is in the same column as the reference, and the height and width of the target range is 1 and 1 respectively, as it is a single cell being referenced as the last cell of the covariance range.
Second series: OFFSET ($C$2, D2, 0, 1, 1):$C$366)
Where, the starting value of the range is the data-point in the one cell, next to the first cell in the column, or at a distance of k=1 cell from the starting cell, to the last cell, which is c366.
Thus, the covariance is calculated for the two series and the output displayed in column E.
Note the dollar signs fix the starting or reference values, so that the values do not change when the formula is copied down to subsequent cells for further values of k.
Similarly, the two ranges for covariance can be found for lags k=2, 3, 4 ….10 just by varying the value in column D from D2 to D3…..D11.
Further, the autocorrelation rk between the two series for lags k=1 to 25 is calculated, simply by replacing the COVARINACE.P ( ) function by CORREL ( ) and the output displayed in column F.
The mean of the series is calculated as mean = average ($c$2:$c$366) and output displayed in column G.
The dollar sign ensures the value to remain fixed on all cells, as it will be used as an input
s0 or the covariance of the series at lag k=0, meaning thereby the variance of the sample series is calculated as VAR.P($C$2:$C$366) and displayed as a constant value range in column H.
Further, the squared deviations of the series values are calculated and displayed in column I as devsq ($C$2:$C$366).
The column J contains constant values of n = 365 for the entire sample series.
Further, there is an alternative method to calculate both sk and thus rk
The column K contains values of sk or the covariance of the two k lagged series from the sample data calculated as the sum of the products of respective series for the respective lag divided by n=365 as follows:
(SUMPRODUCT($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1)-G2, OFFSET($C$2,D2,0,1,1):$C$366-G2))/J2
The value of Autocorrelation function or ACF or rk is calculated as rk = sk/ s0
or K2/H2 for lag k=1, K3/H3 for lag k=2 and so on and so forth…upto k=10
In order to draw the correlogram for ACF, the rk values are plotted against k for k=1 to 10 as follows:



 

The same can also be similarly plotted for k=50 as follows:
 

In the next post we will discuss as to how to plot a PACF function in a correlogram in excel.



 












2 comments:

You may post your valuable comments here