HomeMy WebLinkAbout20170215Fry Rebuttal.pdfRonald L. Williams,ISB No.3034
Williams Bradbury, P.C.
1015 W. Hays St.
Boise,ID 83702
Telephone : (208) 3 44-6633
Email: ron@williamsbradbury.com
Attomeys for Intermountain Gas Company
BEFORE THE IDAHO PUBLIC UTILITIES COMMISSION
IN THE MATTER OF THE APPLICATION OF
INTERMOI.INTAIN GAS COMPANY FOR
THE AUTHORITY TO CHANGE ITS RATES
AND CHARGES FOR NATURAL GAS
SERVICE TO NATURAL GAS CUSTOMERS
IN THE STATE OF IDAHO
)
)
)
)
)
)
)
Case No. INT-G-16-02
REBUTTAL TESTIMONY OF PHIL FRY
FOR INTERMOUNTAIN GAS COMPANY
February 15,2017
1Q.
2A.
J
4
5
6a.
7A.
8
9
l0
ll
t2
l3
t4 a.
15 A.
t6
17
l8 a.
19
20 A.
2t
22
23
Please state your name, position and business address.
My name is Phillip C. Fry. I am a tenured professor in the Information,
Technology and Supply Chain Management Department in the College of
Business and Economics at Boise State University. My business address is
COBE-Boise State University, 1925 University Dr., Boise, lD 83725.
Would you please describe your educational and professional background?
I have a Ph.D. in Quantitative Business Analysis from Louisiana State University.
I have been employed at Boise State University continuously since 1988 in the
College of Business & Economics where I teach a variety of courses including
business statistics. I am a co-author of the textbook Business Statistics: A
Decision-Moking Approach, lFh edition,published by Pearson Education. I was
also a co-author of the same textbook for the 5th through 9th editions. A copy of
my curriculum vitae is attached as Exhibit 41.
What is the purpose of your rebuttal testimony?
I am providing rebuttal testimony related to the statistical models used by the
Company for weather normalization, as well as commenting on the statistical
relevance of the models proposed by Dr. Morrison.
In preparing and offering this rebuttal testimony, did you work alone or did
you work in collaboration with anyone else?
This rebuttal testimony and the analysis behind it was a collaborative effort
between myself and Dr. Patrick Shannon, whom is also a Ph.D. Dr. Shannon is a
co-author of the textbook Business Statistics: A Decision-Mahing Approach,
lFh edition, published by Pearson Education as well as the co-author on the lst
Fry, Reb. I
Intermountain Gas Company
through the 9th editions of the same textbook. Dr. Shannon's curriculum vitae is
attached as Exhibit 42 to this testimony. To the extent that some of the analysis
was conducted by Dr. Shannon, or that some of this testimony was written by Dr.
Shannon, I have reviewed that work, concur in it, and adopt it as my own.
a. Do you and Dr. Shannon agree with Dr. Morrison's statementthat"a
peculiar feature of the Company's model is its use of different weather
coefficients for different months"? (Morrison Di, line22, page 19).
A. No, we do not believe that it is "peculiar". It seems reasonable to expect that
monthly effects exist in natural gas usage. If these effects do exist, then explicitly
incorporating different coefficients for different months is a reasonable modeling
approach in regression analysis. The statistical tests performed on the Company's
models reveal that the coefficients for the different months are statistically
significant.
a. How would you respond to Dr. Morrisonts statement that the use of
autoregressive terms in a predictive weather normalization model is
inappropriate? (Morrison Di, line 7, page 20).
A. Dr. Morrison states that including autoregressive terms is an "obvious problem"
that can cause a model to become "grossly unstable" when used to make
predictions beyond the time period of the data used to create the model. We
disagree and believe Dr. Morrison either does not fully understand the concept of
stability in the statistical sense, or is misusing the term. Instability, in statistical
vernacular, is a term that is generally used to describe the sensitivity of the
estimated regression coefficients when a new sample of data is used to estimate
Fry, Reb. 2
Intermountain Gas Company
1
2
J
4
5
6
7
8
9
the regression model. One potential cause of this instability is multicollinearity,
which occurs when the independent variables are highly correlated with each
other. Another cause of instability occurs when higher order terms, such as cubic
terms are used in the regression model. The use of higher order terms can cause
the matrix of independent variables to become ill-conditioned, which can lead to
instability in estimating the regression coefficients. The Company did not use
higher order terms in its regression equations and there is no evidence of serious
multicollinearity in the Company's model. Consequently, it is my opinion, and the
opinion of Dr. Shannon, that the Company's weather normalization model is not
"grossly unstable".
What about Dr. Morrison's statement that the autoregressive terms make
the model grossly unstable when used to make predictions beyond the time
period of the data used to create the model?
We also disagree with this statement. Errors in forecasts are comprised of three
components: intrinsic error (i.e., noise in the data), parameter estimation error
(errors in estimating the model's coefficients), and model error (i.e., making the
wrong assumptions about how the future will resemble the past). We have
statistical measures for the first two error sources, but not for the third. However,
we know that for any regression model the further into the future the model
forecasts, the greater the overall forecast error becomes. So when predictions are
made outside the mass of data used to develop the model, forecast errors become
more pronounced. This is true for any model, but it is especially true of
polynomial models, such as those developed by Dr. Morrison.
Fry, Reb. 3
Intermountain Gas Company
10
11 a.
t2
l3
t4 A.
15
t6
t7
l8
t9
20
2l
22
23
J
4
5
6
7
8
9
1Q.
2A.
ll
t2
13 a.
t4
l5
16
17 A.
18
l9
2t
Why did the Company include an autoregressive term in its models?
The use of an autoregressive term in the Company's models is a direct result of
the time series data used to develop the regression model. Data that are ordered
with respect to time (e.g., hourly, daily, weekly, etc.) are referred to as time series
data. Residuals from time series regressions are often correlated, that is adjacent
errors tend to move in the same direction. This is referred to as autocorrelation. If
untreated, autocorrelation comrpts the optimal properties of ordinary least squares
regression. Including an autoregressive error correction is an appropriate
statistical remedy in the presence of autocorrelated errors. Because autocorrelated
errors are common in time-series regtession analysis, statistical procedures (e.g.,
Cochrane-Orcutt, Hildreth-Lu, and others) have been developed and included in
commercial statistical software packages to deal with this issue.
Dr. Morrison states that the Company's models were created using
autoregressive terms, but that these terms were not included in the
calculation of monthly consumption and that their omission underestimates
the Companyts monthly consumption estimates.Is this correct?
No, Dr. Morrison is wrong on this point. The Company's model includes an
autoregressive term to correct for autocorrelated errors. The statistical software
used by the Company, eViews, performs a statistical test to determine if
autocorrelation is present. If it is present, the software includes an estimate of the
autoregressive term and incorporates it into the forecast automatically.
Fry, Reb. 4
Intermountain Gas Company
l0
20
a. Do you agree with the Dr. Morrison's statement that the use of
autoregressive terms in weather normalization violates regression's
fundamental independence assumption (Morrison Di, page 20-2lr lines 25-1).
A. Absolutely not. The independence assumption relates to the statistical property
that the error terms are not correlated. This assumption is not violated because an
autoregressive term is used. This assumption is violated in this case due to the
nature of the time series data employed in estimating the model. The use of the
autoregressive term is, in fact, a remedy for this violation, rather than a violation
itself. Adjustment for autocorrelation (also referred to as serial correlation) are
discussed in the textbook The Statistical Sleuth, by Ramsey and Schafer, which is
Dr. Morrison's statistical reference book of preference. (Morrison Di, line,23-24,
page2l.). More extensive explanations for dealing with autocorrelated error terms
can be found in other and more widely used textbooks on econometrics, such as
those by Johnston, Kment4 Judge, Greene, and Wooldridge.
a. Please comment on Dr. Morrison's statement at page 21 of his testimony that
the relationship between Heating Degree Days and consumption is "clearly
non-Iineartt.
A. We disagree that the relationship shown in the scatterplot recommends the use of
a nonlinear regression model. We believe that from a practical perspective the
scatterplot shows a strong positive linear relationship. One would not expect when
modeling real consumption data to find a perfect linear relationship. However,
any departures from perfect linearity in this instance are minimal. A straight line
would fit that relationship quite well, especially in the center mass of the
Fry, Reb. 5
Intermountain Gas Company
I
2
J
4
5
6
7
8
9
a.
A.
observations. Furthermore, we do not see a pronounced curvature (e.g., a U
shape) or roller coaster pattern in the scatterplot that would recommend the need
for a polynomial model with quadratic and cubic terms.
Why would you not fit a polynomial model to this data?
The idea behind regression analysis is to estimate conditional means of the
dependent variable (e.g., consumption) given the explanatory variables. They are
not expected to go through all the data points. We can always improve the fit to
the data by adding more terms. However, this may lead to overhtting the sample
data. In such a case we begin to fit the noise more than the signal. Fitting a cubic
polynomial model implicitly assumes there is some reason why consumption
would vary as a 3'd degree polynomial of heating degree days. We do not believe
that such an assumption can be supported by the scatterplot.
Are there other reasons to avoid using a polynomial regression model when
relationships are highly linear?
Yes. Polynomial models create predictor variables from other variables. For
example, the variable X is used to create the variable P. Creating variables in this
way can introduce multicollinearity in the model, which causes some estimation
challenges. Furthermore, using polynomial terms makes the interpretation of the
estimated coefficients less intuitive and more difficult. In linear regression
models, the coefficients represent the mean change in the dependent variable
while holding other predictor variables in the model constant. This helps us
understand the effect ofone independent variable on the dependent variable from
all the other predictors in the model. However, in regression models with
Fry, Reb. 6
Intermountain Gas Company
l0
lt
t2
13 a.
t4
15 A.
t6
t7
l8
t9
20
2l
22
23
I
2
J
4
5
6
7
8
9
polynomial terms we cannot hold just one predictor variable constant because
other predictor variables have been created from it and will change when the
predictor variable X changes. Furthermore, because there are higher-order termso
the effect of a change in the independent variable is influenced by the value we
set for that independent variable (i.e., it depends on where we start to make the
change in the independent variable). This makes it more challenging to isolate the
effects that predictor has on the dependent variable. So, unless there is a strong
curvature, U-shape, or roller coaster pattern in the plot, a linear model is generally
preferred to a polynomial one.
What approach to model building did Dr. Morrison use for his regression
models?
Dr. Monison states that he used a backward/mixed stepwise regression procedure
to develop models of per-customer consumption for the Company's RS-l and RS-
2 classes and for each of the separate GS-l subclasses.
What is backward stepwise regression?
Stepwise regression is an automated procedure carried out by a computer
program. While there can be slight variations in the way different software
packages perform stepwise regression the backward stepwise technique is
performed using a regression model which begins by containing all the potential
independent variables provided by the analyst. Independent variables which
make no statistical contribution to the model are then removed from the
regtession one at a time. The procedure terminates when all remaining
Fry, Reb. 7
Intermountain Gas Company
l1
t2
t4
t5 a.
t6 A.
t7
18
19
2l
10
l3
20
22
a.
A.
1
2
J
4
5
6
7
8
9
a.
A.
a.
A.
independent variables in the model are judged to be statistically significant.
Backward stepwise is essentially a variable shrinkage technique.
Can there be shortcomings using an automated procedure such as stepwise?
Yes.
Would you briefly identify potential shortcomings that could arise from
using a stepwise procedure?
Yes. The following are some of the shortcomings that can arise when using a
stepwise procedure:
o The significance levels on the statistics for selected models violate the
standard statistical assumptions because the model has been selected rather
than tested within a fixed model.
o Models that are created using stepwise techniques may be over-simplifications
of the real models of the data(See Roecker, Ellen B. (1991) Prediction Eruor
and its Estimation for Subs et-Selected Model s. Te chnometric s, 3 3, 4 5 9-4 68.)
o One of the main issues with stepwise regression is that it searches a large
space of possible models. Consequently, it is prone to overfitting the data.
This means stepwise models will often fit the sample better than new (out of
sample) data.
Have any statisticians called into question the use of stepwise regression
techniques?
Yes. Frank E. Harrell, Jr., Chair of Biostatistics at Vanderbilt University, one of
the leading scholars on this subject and author of Regression Modeling Strategies
Fry, Reb. 8
Intermountain Gas Company
10
ll
t2
13
t4
l5
l6
t7
18
te a.
2tA
20
22
a.
A.
(2001) published by Springer-Verlag has stated that among its various
shortcomings, stepwise regression procedures can:
o Cause severe biases in the resulting multivariable model fits while losing
valuable predictive information from deleting marginally significant variables.
o Result in R2 values that are biased too high compared to the population
o Produce test statistics that do not have the correct distribution
o Produce regtessions coefficients that are biased
o Make variable selection in the presence of multicollinearity arbitrary.
o Relieve the analyst from thinking about the problems of multicollinearity and
of forming and testing hypotheses more generally
What does it mean that stepwise procedures are prone to overfitting the
data?
Because stepwise procedures can search over a large number of potential
variables without regard to how those variables truly relate to the research
question at hand, they are susceptible to selecting independent variables that have
a high risk of overfitting the estimated model to random features in the data. That
is, they are susceptible to fitting the noise rather than the signal in the data. This is
due to the fact that the analyst is letting the computer mechanically select the
regression predictor variables without consideration for the research question
being studied. Thus, it becomes possible for a variable, such as a cubic term, to be
included in the estimated model because it fits sample data well, even when the
term makes no practical sense from a business or research perspective. If the
inclusion of the term results because it closely fits the sample data, rather than
Fry, Reb. 9
Intermountain Gas Company
because it makes sense in terms of the problem at hand, there is a risk that the
estimated model is overfitting the data.
a. What are the consequences to overfitting a model, or'(curve fitting"?
A. Overfitting can result in a model that fits the sample well but does a very poor job
of describing or predicting outside the range of sample data. Consequently, they
fail when applied to new sets of data such as occurs when such models are used in
forecasting applications. We often are unaware of this, however, because models
that are the result of a curve fitting approach are rarely tested using new data.
a. Are Dr. Morrison's models flawed, because they are culre fitted?
A. Yes. Dr. Morrison states that he has used a cubic term-heating degree days
cubed- as a variable in his regression models. However, that is not a variable
that an analyst would consider as being a driver for therms used, because heating
degree days cubed does not provide an infuitive, business, or economic
interpretation for the issue under study. For example, the graph that Dr. Morrison
says indicates a nonlinear relationship does not suggest the need for a cubic term
to model the relationship. However, if such a term were available to a computer,
then an automated procedure, such as a stepwise regression algorithm, could
select it only because it more closely fits the sample data- even when it has no real
economic or business meaning to the analysis at hand. Because the algorithm,
rather than the analyst, is choosing variables automatically to achieve a goal, it
does not consider the theoretical relevance of the variables it fits. Consequently,
the algorithm may choose variables that fit the noise in the set of data.
Furthermore, if the algorithm selects a term that has no practical significance it
Fry, Reb. l0
Intermountain Gas Company
I
2
3
4
5
6
7
8
9
l0
1l
t2
13
t4
15
t6
17
l8
t9
20
2t
22
a.
A.
could eliminate a term that had practical significance, but was marginally
statistically inferior to a selected variable. Overfitting is especially likely to occur
when a model is selected from a set of potential models such as would be the case
in a stepwise regression procedure.
Do you have other concerns with Dr. Morrison's use of a cubic term in a
regression model?
Yes. Potentially, such a term could create some instability in the estimation
process. That is, a cubic term could cause numerical overflow, or round-off
errors, depending on the statistical software being used. This could result in
estimation errors for the coefficients. Furthermore, a cubic term could produce an
influential observation that significantly influences the regression coefficients
estimates. Finally, when higher degree polynomial models, such as those with a
cubic term are used, extrapolation, or forecasting, beyond the range of data is
highly unreliable.
How could a cubic term produce an influential observation?
An influential observation is one that has a large effect on the estimated
regression coefficients. Large values when cubed become significantly large and
values less than one become much smaller when cubed. Using a cubic term could
produce values that are much more extreme than the typical data values, thus
resulting in an influential observation.
Are there any other potential problems with using a cubic term that is
developed from another predictor variable?
Fry, Reb. 11
Intermountain Gas Company
a.
A.
a.
2
J
4
5
6
7
8
9
lA.
t2
13
t4
15
t6
l7
18 0.
t9 A.
20
2t
22
23
a.
A.
Yes there are. A potential cause of multicollinearity is when predictor variables
are used to create quadratic and cubic terms. This is referred to as structural
multicollinearity.
What is multicollinearity?
Multicollinearity in regression analysis is a condition where certain predictor
variables are correlated with other predictor variables or are correlated with a
linear combination of predictor variables in the model. Regression analysis
assumes that the independent variables are neither correlated with each other nor
with a linear combination of other predictor variables.
Why might Dr. Morrison's models suffer from multicollinearity?
Dr. Monison used the predictor Heating Degree Days (HDD) to create two other
predictors: Heating Degree Days2 and Heating Degree Days3' Structural
multicollinearity occurs when a predictor is used to create other predictors.
Because two of the variables have been created from another predictor variable
there is a high degree of correlation between the predictors. In other words, if we
know the value of the predictor variable HDD we can perfectly predict the values
of the other predictor variables HDD2 and HDD3.
What are the consequences of multicollinearity?
There can be several, but one consequence is that multicollinearity can cause
estimated regression coefficients to have an unexpected sign (e.g., a negative sign
when a positive sign is expected). Dr. Morrison's RS-l regression model which
contains the terms HDD, HDD2, and HDD3, clearly suffers from this
multicollinearity problem of having unexpected signs on a regression coefficient.
Fry, Reb. 12
Intermountain Gas Company
a.
A.
l0
l1
I
2
J
4
5
6
7
8
9
10llt2
13
t4
15
t6
T7
l8
t9
20
2t
a.
A.
Why do you say that Dr. Morrison's models have an unexpected sign?
All the HDD variables in his RS-1 model are highly positively correlated with the
dependent variable. The THDD3 term has a positive correlation of 0.928 with the
dependent variable (See correlation matrix below).
Correlation: RS-l, TYr, THDD, THDD2, THDD3
TYr
THDD
THDD2
THDD3
RS-1
-0. 041
0. 990
0. 978
0 .928
0.00s
0.072
0.022
0. 961
0. 900 0. 983
TYT THDD THDD2
a.
A.
The positive correlation indicates that the two variables tend to move in the same
direction-that is, as one variable increases, the other variable generally
increases. In this case, as heating degree days increase, therm usage would
increase. Consequently, one would expect to see a positive sign for the estimated
regression coefficient on each of the HDD terms, including the cubic term.
What is the sign on the cubic term in Dr. Morrison's RS-l model?
The estimated regression coefficient is negative. The value is -26.75. (See
computed output below):
Coef f ici-ents22
23
24
25
26
27
28
29
30
VIF
31 Again, the interpretation of the regression coefficients is made more complex by
32 the quadratic and cubic terms. We expect that increases in HDD would increase
consumption, but the negative term on the cubic coefficient suppresses this effect.
Fry, Reb. 13
Intermountain Gas Company
Term
Constant
TYr
THDD
THDD2
TTIDD3
Coef
44.3L9
-1.575? ?o
54 .92
-26.7s
SE Coef
0.221
o.223
1. 93
4 .64
2 .93
T-Va1ue
200.48
-7.01
4.03
11.84
-9.14
P-Val-ue
0.000
0. 000
0.000
0.000
0.000
1.01
7 5 .9'7
43'l .42
1,7 4 .28
JJ
a.
A.
a.
A.
In addition, the inclusion of quadratic and cubic terms, which are highly
correlated with heating degree days, will tend to impact the value of the
regression coefficient on HDD making it difficult to determine the actual impact
that HDD has on consumption.
Is there a similar problem in Dr. Morrison's RS-2 model?
Yes. Again, the estimated coeffrcient on the cubic term is negative, but the cubic
term is positively related to the dependent variable.
What other consequences might result from multicollinearity?
Multicollinearity can also produce regression estimates that are not statistically
significant, even though the independent variables are significantly correlated
with the dependent variable.
Has this occurred with any models developed by Dr. Morrison?
Yes. The GS-20 model developed by Dr. Morrison is shown in the computer
output below.
Model Summary
S R-sq
1332.63 78. B9t
a.
A.
R-sq (adj )
78.318
R-sq (pred)
77.36*
Coefficients
Term
Constant
TYr
THDD
THDD2
THDD3
Coef
550 I
2'7 9
790
237 6
-644
SE Coef
108
109
944
2268
1433
T-Va1ue
s0. 96
P-Va1ue VT I'
1
5
1
4
2
0
1
-0
0.000
0.012
0.404
0.296
0. 654
55
B4
05
45
1
43
71
02
19
29
52
The T-Value for the heating degree variables indicate that those coefficients are
not statistically significant. The THDD degree variable becomes insignificant due
to the presence of the THDD2 and THDD3 which are attempting to explain the
Fry, Reb. 14
Intermountain Gas Company
1
2
J
4
5
a.
A.
same variation in the dependent variable as THDD. Thus, the statistical impact of
the THDD is diminished due to the inclusion of the quadratic and cubic variables.
How does this compare to the GS model developed by the Company?
The Company's estimated GS model is shown on the table below:
Variable Coefficient Std. Error t-Statistic Prob.
C
JAN65
FE865
MAR65
APR65
MAY65
ocT65
NOV65
DEC65
TR-WgO
SEP
AR(1)
'156.4242
0.455898
0.444788
0.394486
0.266841
0.139923
o.226815
0.317030
0.425533
-0.358425
22.8U95
0.445873
4.04612',1
o.010226
0.011479
0.015224
0.009867
0.013030
0.042276
0.018106
0.011871
0.098681
5.490874
0.081063
38.66029
44.58060
38.74695
25.912',t7
27.04308
10.73825
5.365158
17.50950
35.84741
-3.632153
4.164173
5.500312
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0004
0.0001
0.0000
R-squared
Adjusted R-squared
Mean dependent var
S.D. dependent var
0.992613
0.991953
324.9130
179.4880
6
7
8
9
In this estimated model all coefficients are statistically significant and because a
linear model was used, the interpretation of the regression coefficients is both
straightforward and intuitive.
Could you briefly summarize why you believe the models developed by Dr.
Morrison could be construed to be a curve fitting approach to model
building?
Recall that in layman's terms, "curve fitting" refers to the use of independent
variables in a regression model without regard to their theoretical or business
relevance. When terms are included because they simply increase the measure of
fit, R2, in the estimated regression equation, we are susceptible to curve fitting.
An automated regression estimation technique can result in curve fitting when
Fry, Reb. 15
Intermountain Gas Company
t2
10 a.
11
13 A.
t4
15
I6
t7
1
2
J
4
5
6
7
8
9
l0
l1
t2
l3
t4
l5
t6
t7
18
t9
20
2T
22
a.
A.
variables that have no theoretical relevance enter the model. The analyst is
relieved from having to consider whether a variable that is in the model makes
economic sense because the automated procedure just selects it. It is hard to
imagine there is an economic, or practical, justification for the use of a cubic
terms in his models. However, such a term could enter and remain in a model that
was developed using a mechanical variable selection process.
How does the Company's approach to regression overcome the shortcomings
associated with automated estimation procedures such as stepwise, as
advocated by Dr. Morrison?
The Company developed its models from a research perspective. Initially,
variables that made economic sense in modeling the situation being studied were
identified. Then, the Company applied standard statistical techniques such as
correlation analysis to determine potential independent variables. Finally, linear
regression models using those potential variables were estimated and tested for
their significance.
What other issues exist with respect to the models proposed by Dr.
Morrison?
The data used by the Company witness Blattner and by Dr. Morrison are time
series data. One issue that time-series data can exhibit is autocorrelated errors,
which was mentioned earlier. The Company has dealt with this issue by explicitly
correcting for the autocorrelated error terms using an autoregressive procedure.
Dr. Morrison's model has not dealt with autocorrelated errors.
Fry, Reb. 16
Intermountain Gas Company
a.
A.
I
2
J
4
5
6
7
8
9
10
l1
a.
A.
a.
A.
a.
A.
How do you know that the models proposed by Dr. Morrison exhibit
autocorrelated errors?
I computed a Durbin-Watson statistic for each model Dr. Morrison proposed.
What is a Durbin-Watson statistic?
A Durbin-Watson statistic is calculated from the regression model's residuals. In
regression we assume that the elrors are independent. The Durbin-Watson test
statistic is used to test this assumption.
What were the results of these tests for the models developed by Dr.
Morrison?
The following table shows the calculated Durbin-Watson test statistic for each
model developed by Dr. Morrison.
ModeI Durbin Watson Statistic Sienificant at l%o
RS-I r.29ss YES
RS-2 1.3461 YES
GS-10 1.8445 INCONCLUSIVE
GS-I I 1.140 YES
GS-20 1.007 YES
The Durbin Watson test statistics show strong evidence of positively correlated
error terms in all of Dr. Morrison's models but the GS-l0. In that model the test
for positive autocorrelation was inconclusive. The presence of these positively
correlated error terms does not conform with the assumptions concerning the
distribution and independence of error terms that is assumed in regression
analysis.
What should be done when there is positive correlation in the error terms?
Fry, Reb. 17
Intermountain Gas Company
t2
13
t4
15
t6
t7
t8
te a.
2
aJ
4
5
6
7
8
9
l0
11
t2
t3
t4
l5
t6
t7
18
t9
20
I A. One remedy would be to introduce an autoregressive error correction as was done
with The Company's models. The Company's models were tested for the
presence of autocorrelated error terms and if such errors were detected an
estimation procedure to correct for its effects was used.
Is there anything else related to Dr. Morrison's models you wish to comment
on?
Yes, I examined the residuals from his models. One assumption of regression
analysis is that the residuals (i.e., error terms) from the regression follow a normal
distribution.
How did you examine the residuals from Dr. Morrison's models?
I produced probability plots and Anderson-Darling statistics for each model
estimated by Dr. Morrison. If the residuals are normally distributed, then their
probability plot will be a straight line and the P-Value for the Anderson-Darling
statistic will be large (e.g., > 0.05).
What did you conclude from the residual analysis?
The probability plots (shown below) indicate that the residuals do not follow a
normal distribution. This conclusion is verified by the fact that the Anderson-
Darling Statistic P-Values ile very small, all less than 0.05, indicating that the
distribution of the residuals for each regression model do not follow a normal
distribution. This is a violation of the linear regression assumptions.
Fry, Reb. l8
Intermountain Gas Company
a.
A.
a.
A.
a.
A.
Mon -2.69486E-r4
StDā¬\, 2.EGNTtrAD 5J27P-Value <0.005
,- 1.....
a
:a
I
iranr 2.5692f9E-rAStD6, 4266N I54AD 3308P-Value <0005
2
3
Fry, Reb. 19
Intermountain Gas Company
Men -7.53262e-13StDw 949.5N I53AD 2A5AP-vllua <oOO5
I
Mon -2-laolsE-alStDw 20.69N 154AD 3260P-Vlluc <OOO5
a
2
Fry, Reb. 20
Intermotrntain Gas Company
999
90
80
70
60
5040
10
20
ro
1
2
J
4
5
6
7
8
9
a.
A.
Is there anything else you would like to add about the differences between
the Company's and the Staffs models?
Yes. The Company produced forecasts using their RS-1 and RS-2 models as well
as forecasts using the models that Dr. Morrison developed.
Did Dr. Morrison provide a comparison of his models to actual
consumption?
No. In Company Production Request No. l9 to Staff, the Company asked that
Staff "provide a comparison of StafPs monthly consumption forecasts for 2016
compared with actual consumption". Dr. Morrison's response was, "The
Company did not provide consumption information by class for all months of
20L6, so it is impossible to provide the comparison requested by the Company".
On January 23rd,the Company sent an email to Staff noting that the2016
customer and consumption data for January through June of 2016 had been
provided to Staff in response to Production Request No. 113 and 114, "PR I 13 &
ll4 2016 Billing Data.xls", and again asked that Staffperform the requested
Fry, Reb. 2l
Intermountain Gas Company
a
A.
10
t2
11
l3
t4
l5
Mean -2.35349E-13
StDry Il15N 152
AD 3460
P-Value <0.005
t6
comparison using the data from Production Request No. 113 and 114. Dr.
Morrison responded to that email request by stating, "I did not perform the
analysis you have requested." Because Dr. Morrison did not provide the
comparison, the Company calculated a comparison between Staff s models and
actuals for the yearsZDl2 through 2016.
a. Before you explain that comparison and why it is important, what did you
conclude from the results of the comparison?
A. The comparison of the RS-l and RS-2 models showed that Dr. Morrison's models
would not have predicted usage in each of the years2012 through 2016 as well as
the Company's models did for each of those same years.
a. Why is per{orming this comparative analysis important?
A. Once a model is deemed to be statistically significant, the only real measure of a
forecasting model's worth is how well it actually forecasts. A comparison of the
Company and Staff models in terms of their relative abilities to forecast total
therms provides added justification for the preference of one model over another.
a. How was such a comparison conducted?
A. The comparison was conducted by "back-casting" both Dr. Morrison's and the
Company's models. In essence, a comparison of the RS-l and RS-2 models was
made looking back at how well they forecast the years from2012-2015 as well as
how they performed in the test year of 2016. Both the Company's and Dr.
Morrison's models were applied to the actual heating degree days and trend
variables where applicable each month to calculate the forecast usage. The
forecasts were then compared to actual usage values. Next, commonly used
Fry, Reb. 22
Intermountain Gas Company
1
2
measures of forecast accuracy were computed and the comparative results are
summarized below:
RS-1 Forecast Accuracy Comparison
2012 2013 2014 2015 2016
MAD MAD MAD MAD MAD
2012-
2016
MAD
J
Company backcast
Staff backcast
Company backcast
Staff backcast
Company backcast
Staff backcast
Company backcast
Staff backcast
Company backcast
Staff backcast
Company backcast
Staff backcast
2012
RS-2 Forecast Accuracy Comparison
2013 2014 2015 2016
MAD MAD MAD MAD
2012.
20'16
MADMAD
476,961 262,564
1,030,642 406,9'lil
345,853 410,642 534,584
544,813 724,4'.14 902,087
MAPE MAPE MAPE MAPE MAPE
406,121
721,780
MAPE
3.60%
9.09%
2.610/o
4.540/o
3.09%
5.67Yo
4.05Yo
7.47Y0
4.50Yo
8.63%
Bias Bias Bias Bias Bias
3.57Y0
7.080/o
Bias
6 4
8
0
I
-2
I
2
12 t0
1 0
46
4
5
6
7
8
a. Would you please explain the tables?
A. The MAD refers to the mean absolute deviation and is a measure of forecast
accuracy. The MAD is the average of the absolute value of the forecast errors.
The smaller the MAD, the less the forecast value deviates from the true value in
Fry, Reb. 23
Intermountain Gas Company
83,898
90,189
76,687
80,943
68,351
104,485
127,448
164,343
129,799
115,754
MAPE MAPE MAPE MAPE MAPE
97,236
'111,143
MAPE
5.21%
8.66%
6.88%
8.69%
5.440/o
12.190/o
12.79o/o
19.f 1o/o
8.04%
15.53%
Bias Bias Bias Bias Bias
7.67Yo
12.960/o
Bias
2
8
-4
-2
-4
6
-10 0
I6
.16
26
I
2
3
4
5
6
7
8
9
an absolute sense on average, so smaller MAD values are preferred. The MAPE
is the mean absolute percentage error, which is an alternative measure of forecast
error. It measures the average of the absolute value of the forecast effors as a
percentage of the actual demand value. Smaller MAPE values are preferred.
Finally, the bias is a measure of the tendency of the forecast model to over or
under forecast. Positive measure of bias indicate that the forecast is less than the
actual (i.e., under forecast).
In terms of these measures of accuracy the Company's models generally
outperformed the models proposed by Staff. With the exception of the MAD
calculation for the RS-l model for 2016 and the Bias calculation for RS-2 in 2015
the Company's forecasts were superior to Dr. Morrison's forecasts in every year
with respect to MAD, MAPE and Bias.
Because much of your testimony is so highly technical, can you provide a
brief summary, in laymen's terms regarding the Company's models?
Yes. Dr. Shannon and I reviewed the Company's approach to developing its
forecast. It is our opinionthat:
o The Company's models were developed using variables that are
theoretically relevant to modeling demand.
o The Company's models are statistically significant.
o The estimated coeffrcients in the Company's models are statistically
significant with the expected signs.
o The Company's models corrected for autocorrelated error terms.
o The Company's models provided good forecast accuracy.
Fry, Reb. 24
Intermountain Gas Company
10
11
t2
13 a.
t4
15 A.
t6
t7
l8
t9
20
2t
22
23
I
2
J
4
5
6
7
8
9
a. Do you have any final thoughts regarding the Company's models?
A. Yes. As stated in the letter filed with this Case as Exhibit No. 18, it is Dr.
Shannon's and my opinion that the methods the Company used in preparing its
three statistical regression models are appropriate and are based on sound
statistical methodology. We believe that the methods used by Intermountain are
appropriate for weather normalization and that Intermountain's approach follows
the methodology authorized by the IPUC in Case U-1034-134.
a. Does this conclude your rebuttal testimony?
A. Yes, thank you, it does.
Fry, Reb. 25
Intermountain Gas Company