As we have already studied in > this post, the autocorrelation function is a ratio of
the covariance between the range of n values of time series separated by lag k
to that of the covariance between range of n values, separated by k=0 that is
the same series, which is also the variance of the sample data series.
Thus,
the Autocorrelation function (ACF) at lag k≥0, is denoted as rk =
sk/ s0
Where, sk
is the auto covariance of the time series at lag k≥0, and is denoted
as,
The
variance of the time series can be denoted as r0
The plot
of rk against k is known as a Correlogram.
Thus, the
autocovariance as defined above is slightly different from the usual definition
of covariance between two variable series {Y1, ... , Yn+k)
and {Yk+1 , ……. , Yn). Firstly, we divide by the n, instead
of n-k and we subtract the overall average instead of the averages of {Y1,
... , Yn+k) and {Yk+1 , ……. , Yn)
respectively. For the values of n, which are greater than k, the
difference will be small.
For
calculation of sk and rk for an example
range in the AAPL dataset in the autocorrelation blogpost – 4, let’s observe the following snapshot of
the excel screen:
The
variance of the time series can be denoted as r0
The plot
of rk against k is known as a Correlogram.
Thus, the
autocovariance as defined above is slightly different from the usual definition
of covariance between two variable series {Y1, ... , Yn+k)
and {Yk+1 , ……. , Yn). Firstly, we divide by the n, instead
of n-k and we subtract the overall average instead of the averages of {Y1,
... , Yn+k) and {Yk+1 , ……. , Yn)
respectively. For the values of n, which are greater than k, the
difference will be small.
For
calculation of sk and rk for an example
range in the AAPL dataset in the autocorrelation blogpost – 4, let’s observe the following snapshot of
the excel screen:
Where, we want to calculate and plot the autocorrelation for the AAPL close time series for Lag k=1 to 10, where n=365 and the range of values for AAPL is c2 to c366 in excel file.
In the
above example, the values required for autocorrelation are calculated using the
above formula for example for k=1, as:
Here,
S1
= COVARIANCE.S($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1), OFFSET($C$2,D2,0,1,1):$C$366)
Wherein,
we calculate the covariance of two series from the data namely one starting at
the first data point in cell c2 to last but one data point in the series, which
is in cell c365. In order to reference the two series for covariance, they are
referenced as follows:
First series: ($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1)
Where, c2
is the starting data point and offset function returns the reference to the desired
cell in the following format:
OFFSET (reference,
rows, cols, [height],[width]),
which returns a
reference to a range that is a given number of rows and columns from a given
reference.
In the above example,
OFFSET function counts the value from the reference cell C2, to a count of c2
to c365 rows less the number of lags, which in this case is D2=1, col=1, as the
range is in the same column as the reference, and the height and width of the
target range is 1 and 1 respectively, as it is a single cell being referenced
as the last cell of the covariance range.
Second series: OFFSET ($C$2,
D2, 0, 1, 1):$C$366)
Where, the starting
value of the range is the data-point in the one cell, next to the first cell in
the column, or at a distance of k=1 cell from the starting cell, to the last
cell, which is c366.
Thus, the
covariance is calculated for the two series and the output displayed in column
E.
Note the
dollar signs fix the starting or reference values, so that the values do not
change when the formula is copied down to subsequent cells for further values
of k.
Similarly,
the two ranges for covariance can be found for lags k=2, 3, 4 ….10 just by
varying the value in column D from D2 to D3…..D11.
Further,
the autocorrelation rk between the two series for lags k=1 to
25 is calculated, simply by replacing the COVARINACE.P ( ) function by CORREL (
) and the output displayed in column F.
The mean
of the series is calculated as mean = average ($c$2:$c$366) and output
displayed in column G.
The
dollar sign ensures the value to remain fixed on all cells, as it will be used
as an input
s0
or the covariance of the series at lag k=0, meaning thereby the variance of the
sample series is calculated as VAR.P($C$2:$C$366) and displayed as a constant
value range in column H.
Further,
the squared deviations of the series values are calculated and displayed in
column I as devsq ($C$2:$C$366).
The
column J contains constant values of n = 365 for the entire sample series.
Further,
there is an alternative method to calculate both sk and thus rk
The
column K contains values of sk or the covariance of the two k
lagged series from the sample data calculated as the sum of the products of
respective series for the respective lag divided by n=365 as follows:
(SUMPRODUCT($C$2:OFFSET($C$2,COUNT($C$2:$C$365)-D2,0,1,1)-G2,
OFFSET($C$2,D2,0,1,1):$C$366-G2))/J2
The value
of Autocorrelation function or ACF or rk is calculated as rk
= sk/ s0
or K2/H2
for lag k=1, K3/H3 for lag k=2 and so on and so forth…upto k=10
In order
to draw the correlogram for ACF, the rk values are plotted against k for k=1 to
10 as follows:
The same
can also be similarly plotted for k=50 as follows:
In the next post we will discuss as to how to plot a PACF function in a correlogram in excel.
Thank you, again! Duncan
ReplyDeleteWelcome again!
Delete