Loading...
HomeMy WebLinkAbout20170215Fry Rebuttal.pdfRonald L. Williams,ISB No.3034 Williams Bradbury, P.C. 1015 W. Hays St. Boise,ID 83702 Telephone : (208) 3 44-6633 Email: ron@williamsbradbury.com Attomeys for Intermountain Gas Company BEFORE THE IDAHO PUBLIC UTILITIES COMMISSION IN THE MATTER OF THE APPLICATION OF INTERMOI.INTAIN GAS COMPANY FOR THE AUTHORITY TO CHANGE ITS RATES AND CHARGES FOR NATURAL GAS SERVICE TO NATURAL GAS CUSTOMERS IN THE STATE OF IDAHO ) ) ) ) ) ) ) Case No. INT-G-16-02 REBUTTAL TESTIMONY OF PHIL FRY FOR INTERMOUNTAIN GAS COMPANY February 15,2017 1Q. 2A. J 4 5 6a. 7A. 8 9 l0 ll t2 l3 t4 a. 15 A. t6 17 l8 a. 19 20 A. 2t 22 23 Please state your name, position and business address. My name is Phillip C. Fry. I am a tenured professor in the Information, Technology and Supply Chain Management Department in the College of Business and Economics at Boise State University. My business address is COBE-Boise State University, 1925 University Dr., Boise, lD 83725. Would you please describe your educational and professional background? I have a Ph.D. in Quantitative Business Analysis from Louisiana State University. I have been employed at Boise State University continuously since 1988 in the College of Business & Economics where I teach a variety of courses including business statistics. I am a co-author of the textbook Business Statistics: A Decision-Moking Approach, lFh edition,published by Pearson Education. I was also a co-author of the same textbook for the 5th through 9th editions. A copy of my curriculum vitae is attached as Exhibit 41. What is the purpose of your rebuttal testimony? I am providing rebuttal testimony related to the statistical models used by the Company for weather normalization, as well as commenting on the statistical relevance of the models proposed by Dr. Morrison. In preparing and offering this rebuttal testimony, did you work alone or did you work in collaboration with anyone else? This rebuttal testimony and the analysis behind it was a collaborative effort between myself and Dr. Patrick Shannon, whom is also a Ph.D. Dr. Shannon is a co-author of the textbook Business Statistics: A Decision-Mahing Approach, lFh edition, published by Pearson Education as well as the co-author on the lst Fry, Reb. I Intermountain Gas Company through the 9th editions of the same textbook. Dr. Shannon's curriculum vitae is attached as Exhibit 42 to this testimony. To the extent that some of the analysis was conducted by Dr. Shannon, or that some of this testimony was written by Dr. Shannon, I have reviewed that work, concur in it, and adopt it as my own. a. Do you and Dr. Shannon agree with Dr. Morrison's statementthat"a peculiar feature of the Company's model is its use of different weather coefficients for different months"? (Morrison Di, line22, page 19). A. No, we do not believe that it is "peculiar". It seems reasonable to expect that monthly effects exist in natural gas usage. If these effects do exist, then explicitly incorporating different coefficients for different months is a reasonable modeling approach in regression analysis. The statistical tests performed on the Company's models reveal that the coefficients for the different months are statistically significant. a. How would you respond to Dr. Morrisonts statement that the use of autoregressive terms in a predictive weather normalization model is inappropriate? (Morrison Di, line 7, page 20). A. Dr. Morrison states that including autoregressive terms is an "obvious problem" that can cause a model to become "grossly unstable" when used to make predictions beyond the time period of the data used to create the model. We disagree and believe Dr. Morrison either does not fully understand the concept of stability in the statistical sense, or is misusing the term. Instability, in statistical vernacular, is a term that is generally used to describe the sensitivity of the estimated regression coefficients when a new sample of data is used to estimate Fry, Reb. 2 Intermountain Gas Company 1 2 J 4 5 6 7 8 9 the regression model. One potential cause of this instability is multicollinearity, which occurs when the independent variables are highly correlated with each other. Another cause of instability occurs when higher order terms, such as cubic terms are used in the regression model. The use of higher order terms can cause the matrix of independent variables to become ill-conditioned, which can lead to instability in estimating the regression coefficients. The Company did not use higher order terms in its regression equations and there is no evidence of serious multicollinearity in the Company's model. Consequently, it is my opinion, and the opinion of Dr. Shannon, that the Company's weather normalization model is not "grossly unstable". What about Dr. Morrison's statement that the autoregressive terms make the model grossly unstable when used to make predictions beyond the time period of the data used to create the model? We also disagree with this statement. Errors in forecasts are comprised of three components: intrinsic error (i.e., noise in the data), parameter estimation error (errors in estimating the model's coefficients), and model error (i.e., making the wrong assumptions about how the future will resemble the past). We have statistical measures for the first two error sources, but not for the third. However, we know that for any regression model the further into the future the model forecasts, the greater the overall forecast error becomes. So when predictions are made outside the mass of data used to develop the model, forecast errors become more pronounced. This is true for any model, but it is especially true of polynomial models, such as those developed by Dr. Morrison. Fry, Reb. 3 Intermountain Gas Company 10 11 a. t2 l3 t4 A. 15 t6 t7 l8 t9 20 2l 22 23 J 4 5 6 7 8 9 1Q. 2A. ll t2 13 a. t4 l5 16 17 A. 18 l9 2t Why did the Company include an autoregressive term in its models? The use of an autoregressive term in the Company's models is a direct result of the time series data used to develop the regression model. Data that are ordered with respect to time (e.g., hourly, daily, weekly, etc.) are referred to as time series data. Residuals from time series regressions are often correlated, that is adjacent errors tend to move in the same direction. This is referred to as autocorrelation. If untreated, autocorrelation comrpts the optimal properties of ordinary least squares regression. Including an autoregressive error correction is an appropriate statistical remedy in the presence of autocorrelated errors. Because autocorrelated errors are common in time-series regtession analysis, statistical procedures (e.g., Cochrane-Orcutt, Hildreth-Lu, and others) have been developed and included in commercial statistical software packages to deal with this issue. Dr. Morrison states that the Company's models were created using autoregressive terms, but that these terms were not included in the calculation of monthly consumption and that their omission underestimates the Companyts monthly consumption estimates.Is this correct? No, Dr. Morrison is wrong on this point. The Company's model includes an autoregressive term to correct for autocorrelated errors. The statistical software used by the Company, eViews, performs a statistical test to determine if autocorrelation is present. If it is present, the software includes an estimate of the autoregressive term and incorporates it into the forecast automatically. Fry, Reb. 4 Intermountain Gas Company l0 20 a. Do you agree with the Dr. Morrison's statement that the use of autoregressive terms in weather normalization violates regression's fundamental independence assumption (Morrison Di, page 20-2lr lines 25-1). A. Absolutely not. The independence assumption relates to the statistical property that the error terms are not correlated. This assumption is not violated because an autoregressive term is used. This assumption is violated in this case due to the nature of the time series data employed in estimating the model. The use of the autoregressive term is, in fact, a remedy for this violation, rather than a violation itself. Adjustment for autocorrelation (also referred to as serial correlation) are discussed in the textbook The Statistical Sleuth, by Ramsey and Schafer, which is Dr. Morrison's statistical reference book of preference. (Morrison Di, line,23-24, page2l.). More extensive explanations for dealing with autocorrelated error terms can be found in other and more widely used textbooks on econometrics, such as those by Johnston, Kment4 Judge, Greene, and Wooldridge. a. Please comment on Dr. Morrison's statement at page 21 of his testimony that the relationship between Heating Degree Days and consumption is "clearly non-Iineartt. A. We disagree that the relationship shown in the scatterplot recommends the use of a nonlinear regression model. We believe that from a practical perspective the scatterplot shows a strong positive linear relationship. One would not expect when modeling real consumption data to find a perfect linear relationship. However, any departures from perfect linearity in this instance are minimal. A straight line would fit that relationship quite well, especially in the center mass of the Fry, Reb. 5 Intermountain Gas Company I 2 J 4 5 6 7 8 9 a. A. observations. Furthermore, we do not see a pronounced curvature (e.g., a U shape) or roller coaster pattern in the scatterplot that would recommend the need for a polynomial model with quadratic and cubic terms. Why would you not fit a polynomial model to this data? The idea behind regression analysis is to estimate conditional means of the dependent variable (e.g., consumption) given the explanatory variables. They are not expected to go through all the data points. We can always improve the fit to the data by adding more terms. However, this may lead to overhtting the sample data. In such a case we begin to fit the noise more than the signal. Fitting a cubic polynomial model implicitly assumes there is some reason why consumption would vary as a 3'd degree polynomial of heating degree days. We do not believe that such an assumption can be supported by the scatterplot. Are there other reasons to avoid using a polynomial regression model when relationships are highly linear? Yes. Polynomial models create predictor variables from other variables. For example, the variable X is used to create the variable P. Creating variables in this way can introduce multicollinearity in the model, which causes some estimation challenges. Furthermore, using polynomial terms makes the interpretation of the estimated coefficients less intuitive and more difficult. In linear regression models, the coefficients represent the mean change in the dependent variable while holding other predictor variables in the model constant. This helps us understand the effect ofone independent variable on the dependent variable from all the other predictors in the model. However, in regression models with Fry, Reb. 6 Intermountain Gas Company l0 lt t2 13 a. t4 15 A. t6 t7 l8 t9 20 2l 22 23 I 2 J 4 5 6 7 8 9 polynomial terms we cannot hold just one predictor variable constant because other predictor variables have been created from it and will change when the predictor variable X changes. Furthermore, because there are higher-order termso the effect of a change in the independent variable is influenced by the value we set for that independent variable (i.e., it depends on where we start to make the change in the independent variable). This makes it more challenging to isolate the effects that predictor has on the dependent variable. So, unless there is a strong curvature, U-shape, or roller coaster pattern in the plot, a linear model is generally preferred to a polynomial one. What approach to model building did Dr. Morrison use for his regression models? Dr. Monison states that he used a backward/mixed stepwise regression procedure to develop models of per-customer consumption for the Company's RS-l and RS- 2 classes and for each of the separate GS-l subclasses. What is backward stepwise regression? Stepwise regression is an automated procedure carried out by a computer program. While there can be slight variations in the way different software packages perform stepwise regression the backward stepwise technique is performed using a regression model which begins by containing all the potential independent variables provided by the analyst. Independent variables which make no statistical contribution to the model are then removed from the regtession one at a time. The procedure terminates when all remaining Fry, Reb. 7 Intermountain Gas Company l1 t2 t4 t5 a. t6 A. t7 18 19 2l 10 l3 20 22 a. A. 1 2 J 4 5 6 7 8 9 a. A. a. A. independent variables in the model are judged to be statistically significant. Backward stepwise is essentially a variable shrinkage technique. Can there be shortcomings using an automated procedure such as stepwise? Yes. Would you briefly identify potential shortcomings that could arise from using a stepwise procedure? Yes. The following are some of the shortcomings that can arise when using a stepwise procedure: o The significance levels on the statistics for selected models violate the standard statistical assumptions because the model has been selected rather than tested within a fixed model. o Models that are created using stepwise techniques may be over-simplifications of the real models of the data(See Roecker, Ellen B. (1991) Prediction Eruor and its Estimation for Subs et-Selected Model s. Te chnometric s, 3 3, 4 5 9-4 68.) o One of the main issues with stepwise regression is that it searches a large space of possible models. Consequently, it is prone to overfitting the data. This means stepwise models will often fit the sample better than new (out of sample) data. Have any statisticians called into question the use of stepwise regression techniques? Yes. Frank E. Harrell, Jr., Chair of Biostatistics at Vanderbilt University, one of the leading scholars on this subject and author of Regression Modeling Strategies Fry, Reb. 8 Intermountain Gas Company 10 ll t2 13 t4 l5 l6 t7 18 te a. 2tA 20 22 a. A. (2001) published by Springer-Verlag has stated that among its various shortcomings, stepwise regression procedures can: o Cause severe biases in the resulting multivariable model fits while losing valuable predictive information from deleting marginally significant variables. o Result in R2 values that are biased too high compared to the population o Produce test statistics that do not have the correct distribution o Produce regtessions coefficients that are biased o Make variable selection in the presence of multicollinearity arbitrary. o Relieve the analyst from thinking about the problems of multicollinearity and of forming and testing hypotheses more generally What does it mean that stepwise procedures are prone to overfitting the data? Because stepwise procedures can search over a large number of potential variables without regard to how those variables truly relate to the research question at hand, they are susceptible to selecting independent variables that have a high risk of overfitting the estimated model to random features in the data. That is, they are susceptible to fitting the noise rather than the signal in the data. This is due to the fact that the analyst is letting the computer mechanically select the regression predictor variables without consideration for the research question being studied. Thus, it becomes possible for a variable, such as a cubic term, to be included in the estimated model because it fits sample data well, even when the term makes no practical sense from a business or research perspective. If the inclusion of the term results because it closely fits the sample data, rather than Fry, Reb. 9 Intermountain Gas Company because it makes sense in terms of the problem at hand, there is a risk that the estimated model is overfitting the data. a. What are the consequences to overfitting a model, or'(curve fitting"? A. Overfitting can result in a model that fits the sample well but does a very poor job of describing or predicting outside the range of sample data. Consequently, they fail when applied to new sets of data such as occurs when such models are used in forecasting applications. We often are unaware of this, however, because models that are the result of a curve fitting approach are rarely tested using new data. a. Are Dr. Morrison's models flawed, because they are culre fitted? A. Yes. Dr. Morrison states that he has used a cubic term-heating degree days cubed- as a variable in his regression models. However, that is not a variable that an analyst would consider as being a driver for therms used, because heating degree days cubed does not provide an infuitive, business, or economic interpretation for the issue under study. For example, the graph that Dr. Morrison says indicates a nonlinear relationship does not suggest the need for a cubic term to model the relationship. However, if such a term were available to a computer, then an automated procedure, such as a stepwise regression algorithm, could select it only because it more closely fits the sample data- even when it has no real economic or business meaning to the analysis at hand. Because the algorithm, rather than the analyst, is choosing variables automatically to achieve a goal, it does not consider the theoretical relevance of the variables it fits. Consequently, the algorithm may choose variables that fit the noise in the set of data. Furthermore, if the algorithm selects a term that has no practical significance it Fry, Reb. l0 Intermountain Gas Company I 2 3 4 5 6 7 8 9 l0 1l t2 13 t4 15 t6 17 l8 t9 20 2t 22 a. A. could eliminate a term that had practical significance, but was marginally statistically inferior to a selected variable. Overfitting is especially likely to occur when a model is selected from a set of potential models such as would be the case in a stepwise regression procedure. Do you have other concerns with Dr. Morrison's use of a cubic term in a regression model? Yes. Potentially, such a term could create some instability in the estimation process. That is, a cubic term could cause numerical overflow, or round-off errors, depending on the statistical software being used. This could result in estimation errors for the coefficients. Furthermore, a cubic term could produce an influential observation that significantly influences the regression coefficients estimates. Finally, when higher degree polynomial models, such as those with a cubic term are used, extrapolation, or forecasting, beyond the range of data is highly unreliable. How could a cubic term produce an influential observation? An influential observation is one that has a large effect on the estimated regression coefficients. Large values when cubed become significantly large and values less than one become much smaller when cubed. Using a cubic term could produce values that are much more extreme than the typical data values, thus resulting in an influential observation. Are there any other potential problems with using a cubic term that is developed from another predictor variable? Fry, Reb. 11 Intermountain Gas Company a. A. a. 2 J 4 5 6 7 8 9 lA. t2 13 t4 15 t6 l7 18 0. t9 A. 20 2t 22 23 a. A. Yes there are. A potential cause of multicollinearity is when predictor variables are used to create quadratic and cubic terms. This is referred to as structural multicollinearity. What is multicollinearity? Multicollinearity in regression analysis is a condition where certain predictor variables are correlated with other predictor variables or are correlated with a linear combination of predictor variables in the model. Regression analysis assumes that the independent variables are neither correlated with each other nor with a linear combination of other predictor variables. Why might Dr. Morrison's models suffer from multicollinearity? Dr. Monison used the predictor Heating Degree Days (HDD) to create two other predictors: Heating Degree Days2 and Heating Degree Days3' Structural multicollinearity occurs when a predictor is used to create other predictors. Because two of the variables have been created from another predictor variable there is a high degree of correlation between the predictors. In other words, if we know the value of the predictor variable HDD we can perfectly predict the values of the other predictor variables HDD2 and HDD3. What are the consequences of multicollinearity? There can be several, but one consequence is that multicollinearity can cause estimated regression coefficients to have an unexpected sign (e.g., a negative sign when a positive sign is expected). Dr. Morrison's RS-l regression model which contains the terms HDD, HDD2, and HDD3, clearly suffers from this multicollinearity problem of having unexpected signs on a regression coefficient. Fry, Reb. 12 Intermountain Gas Company a. A. l0 l1 I 2 J 4 5 6 7 8 9 10llt2 13 t4 15 t6 T7 l8 t9 20 2t a. A. Why do you say that Dr. Morrison's models have an unexpected sign? All the HDD variables in his RS-1 model are highly positively correlated with the dependent variable. The THDD3 term has a positive correlation of 0.928 with the dependent variable (See correlation matrix below). Correlation: RS-l, TYr, THDD, THDD2, THDD3 TYr THDD THDD2 THDD3 RS-1 -0. 041 0. 990 0. 978 0 .928 0.00s 0.072 0.022 0. 961 0. 900 0. 983 TYT THDD THDD2 a. A. The positive correlation indicates that the two variables tend to move in the same direction-that is, as one variable increases, the other variable generally increases. In this case, as heating degree days increase, therm usage would increase. Consequently, one would expect to see a positive sign for the estimated regression coefficient on each of the HDD terms, including the cubic term. What is the sign on the cubic term in Dr. Morrison's RS-l model? The estimated regression coefficient is negative. The value is -26.75. (See computed output below): Coef f ici-ents22 23 24 25 26 27 28 29 30 VIF 31 Again, the interpretation of the regression coefficients is made more complex by 32 the quadratic and cubic terms. We expect that increases in HDD would increase consumption, but the negative term on the cubic coefficient suppresses this effect. Fry, Reb. 13 Intermountain Gas Company Term Constant TYr THDD THDD2 TTIDD3 Coef 44.3L9 -1.575? ?o 54 .92 -26.7s SE Coef 0.221 o.223 1. 93 4 .64 2 .93 T-Va1ue 200.48 -7.01 4.03 11.84 -9.14 P-Val-ue 0.000 0. 000 0.000 0.000 0.000 1.01 7 5 .9'7 43'l .42 1,7 4 .28 JJ a. A. a. A. In addition, the inclusion of quadratic and cubic terms, which are highly correlated with heating degree days, will tend to impact the value of the regression coefficient on HDD making it difficult to determine the actual impact that HDD has on consumption. Is there a similar problem in Dr. Morrison's RS-2 model? Yes. Again, the estimated coeffrcient on the cubic term is negative, but the cubic term is positively related to the dependent variable. What other consequences might result from multicollinearity? Multicollinearity can also produce regression estimates that are not statistically significant, even though the independent variables are significantly correlated with the dependent variable. Has this occurred with any models developed by Dr. Morrison? Yes. The GS-20 model developed by Dr. Morrison is shown in the computer output below. Model Summary S R-sq 1332.63 78. B9t a. A. R-sq (adj ) 78.318 R-sq (pred) 77.36* Coefficients Term Constant TYr THDD THDD2 THDD3 Coef 550 I 2'7 9 790 237 6 -644 SE Coef 108 109 944 2268 1433 T-Va1ue s0. 96 P-Va1ue VT I' 1 5 1 4 2 0 1 -0 0.000 0.012 0.404 0.296 0. 654 55 B4 05 45 1 43 71 02 19 29 52 The T-Value for the heating degree variables indicate that those coefficients are not statistically significant. The THDD degree variable becomes insignificant due to the presence of the THDD2 and THDD3 which are attempting to explain the Fry, Reb. 14 Intermountain Gas Company 1 2 J 4 5 a. A. same variation in the dependent variable as THDD. Thus, the statistical impact of the THDD is diminished due to the inclusion of the quadratic and cubic variables. How does this compare to the GS model developed by the Company? The Company's estimated GS model is shown on the table below: Variable Coefficient Std. Error t-Statistic Prob. C JAN65 FE865 MAR65 APR65 MAY65 ocT65 NOV65 DEC65 TR-WgO SEP AR(1) '156.4242 0.455898 0.444788 0.394486 0.266841 0.139923 o.226815 0.317030 0.425533 -0.358425 22.8U95 0.445873 4.04612',1 o.010226 0.011479 0.015224 0.009867 0.013030 0.042276 0.018106 0.011871 0.098681 5.490874 0.081063 38.66029 44.58060 38.74695 25.912',t7 27.04308 10.73825 5.365158 17.50950 35.84741 -3.632153 4.164173 5.500312 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0001 0.0000 R-squared Adjusted R-squared Mean dependent var S.D. dependent var 0.992613 0.991953 324.9130 179.4880 6 7 8 9 In this estimated model all coefficients are statistically significant and because a linear model was used, the interpretation of the regression coefficients is both straightforward and intuitive. Could you briefly summarize why you believe the models developed by Dr. Morrison could be construed to be a curve fitting approach to model building? Recall that in layman's terms, "curve fitting" refers to the use of independent variables in a regression model without regard to their theoretical or business relevance. When terms are included because they simply increase the measure of fit, R2, in the estimated regression equation, we are susceptible to curve fitting. An automated regression estimation technique can result in curve fitting when Fry, Reb. 15 Intermountain Gas Company t2 10 a. 11 13 A. t4 15 I6 t7 1 2 J 4 5 6 7 8 9 l0 l1 t2 l3 t4 l5 t6 t7 18 t9 20 2T 22 a. A. variables that have no theoretical relevance enter the model. The analyst is relieved from having to consider whether a variable that is in the model makes economic sense because the automated procedure just selects it. It is hard to imagine there is an economic, or practical, justification for the use of a cubic terms in his models. However, such a term could enter and remain in a model that was developed using a mechanical variable selection process. How does the Company's approach to regression overcome the shortcomings associated with automated estimation procedures such as stepwise, as advocated by Dr. Morrison? The Company developed its models from a research perspective. Initially, variables that made economic sense in modeling the situation being studied were identified. Then, the Company applied standard statistical techniques such as correlation analysis to determine potential independent variables. Finally, linear regression models using those potential variables were estimated and tested for their significance. What other issues exist with respect to the models proposed by Dr. Morrison? The data used by the Company witness Blattner and by Dr. Morrison are time series data. One issue that time-series data can exhibit is autocorrelated errors, which was mentioned earlier. The Company has dealt with this issue by explicitly correcting for the autocorrelated error terms using an autoregressive procedure. Dr. Morrison's model has not dealt with autocorrelated errors. Fry, Reb. 16 Intermountain Gas Company a. A. I 2 J 4 5 6 7 8 9 10 l1 a. A. a. A. a. A. How do you know that the models proposed by Dr. Morrison exhibit autocorrelated errors? I computed a Durbin-Watson statistic for each model Dr. Morrison proposed. What is a Durbin-Watson statistic? A Durbin-Watson statistic is calculated from the regression model's residuals. In regression we assume that the elrors are independent. The Durbin-Watson test statistic is used to test this assumption. What were the results of these tests for the models developed by Dr. Morrison? The following table shows the calculated Durbin-Watson test statistic for each model developed by Dr. Morrison. ModeI Durbin Watson Statistic Sienificant at l%o RS-I r.29ss YES RS-2 1.3461 YES GS-10 1.8445 INCONCLUSIVE GS-I I 1.140 YES GS-20 1.007 YES The Durbin Watson test statistics show strong evidence of positively correlated error terms in all of Dr. Morrison's models but the GS-l0. In that model the test for positive autocorrelation was inconclusive. The presence of these positively correlated error terms does not conform with the assumptions concerning the distribution and independence of error terms that is assumed in regression analysis. What should be done when there is positive correlation in the error terms? Fry, Reb. 17 Intermountain Gas Company t2 13 t4 15 t6 t7 t8 te a. 2 aJ 4 5 6 7 8 9 l0 11 t2 t3 t4 l5 t6 t7 18 t9 20 I A. One remedy would be to introduce an autoregressive error correction as was done with The Company's models. The Company's models were tested for the presence of autocorrelated error terms and if such errors were detected an estimation procedure to correct for its effects was used. Is there anything else related to Dr. Morrison's models you wish to comment on? Yes, I examined the residuals from his models. One assumption of regression analysis is that the residuals (i.e., error terms) from the regression follow a normal distribution. How did you examine the residuals from Dr. Morrison's models? I produced probability plots and Anderson-Darling statistics for each model estimated by Dr. Morrison. If the residuals are normally distributed, then their probability plot will be a straight line and the P-Value for the Anderson-Darling statistic will be large (e.g., > 0.05). What did you conclude from the residual analysis? The probability plots (shown below) indicate that the residuals do not follow a normal distribution. This conclusion is verified by the fact that the Anderson- Darling Statistic P-Values ile very small, all less than 0.05, indicating that the distribution of the residuals for each regression model do not follow a normal distribution. This is a violation of the linear regression assumptions. Fry, Reb. l8 Intermountain Gas Company a. A. a. A. a. A. Mon -2.69486E-r4 StDā‚¬\, 2.EGNTtrAD 5J27P-Value <0.005 ,- 1..... a :a I iranr 2.5692f9E-rAStD6, 4266N I54AD 3308P-Value <0005 2 3 Fry, Reb. 19 Intermountain Gas Company Men -7.53262e-13StDw 949.5N I53AD 2A5AP-vllua <oOO5 I Mon -2-laolsE-alStDw 20.69N 154AD 3260P-Vlluc <OOO5 a 2 Fry, Reb. 20 Intermotrntain Gas Company 999 90 80 70 60 5040 10 20 ro 1 2 J 4 5 6 7 8 9 a. A. Is there anything else you would like to add about the differences between the Company's and the Staffs models? Yes. The Company produced forecasts using their RS-1 and RS-2 models as well as forecasts using the models that Dr. Morrison developed. Did Dr. Morrison provide a comparison of his models to actual consumption? No. In Company Production Request No. l9 to Staff, the Company asked that Staff "provide a comparison of StafPs monthly consumption forecasts for 2016 compared with actual consumption". Dr. Morrison's response was, "The Company did not provide consumption information by class for all months of 20L6, so it is impossible to provide the comparison requested by the Company". On January 23rd,the Company sent an email to Staff noting that the2016 customer and consumption data for January through June of 2016 had been provided to Staff in response to Production Request No. 113 and 114, "PR I 13 & ll4 2016 Billing Data.xls", and again asked that Staffperform the requested Fry, Reb. 2l Intermountain Gas Company a A. 10 t2 11 l3 t4 l5 Mean -2.35349E-13 StDry Il15N 152 AD 3460 P-Value <0.005 t6 comparison using the data from Production Request No. 113 and 114. Dr. Morrison responded to that email request by stating, "I did not perform the analysis you have requested." Because Dr. Morrison did not provide the comparison, the Company calculated a comparison between Staff s models and actuals for the yearsZDl2 through 2016. a. Before you explain that comparison and why it is important, what did you conclude from the results of the comparison? A. The comparison of the RS-l and RS-2 models showed that Dr. Morrison's models would not have predicted usage in each of the years2012 through 2016 as well as the Company's models did for each of those same years. a. Why is per{orming this comparative analysis important? A. Once a model is deemed to be statistically significant, the only real measure of a forecasting model's worth is how well it actually forecasts. A comparison of the Company and Staff models in terms of their relative abilities to forecast total therms provides added justification for the preference of one model over another. a. How was such a comparison conducted? A. The comparison was conducted by "back-casting" both Dr. Morrison's and the Company's models. In essence, a comparison of the RS-l and RS-2 models was made looking back at how well they forecast the years from2012-2015 as well as how they performed in the test year of 2016. Both the Company's and Dr. Morrison's models were applied to the actual heating degree days and trend variables where applicable each month to calculate the forecast usage. The forecasts were then compared to actual usage values. Next, commonly used Fry, Reb. 22 Intermountain Gas Company 1 2 measures of forecast accuracy were computed and the comparative results are summarized below: RS-1 Forecast Accuracy Comparison 2012 2013 2014 2015 2016 MAD MAD MAD MAD MAD 2012- 2016 MAD J Company backcast Staff backcast Company backcast Staff backcast Company backcast Staff backcast Company backcast Staff backcast Company backcast Staff backcast Company backcast Staff backcast 2012 RS-2 Forecast Accuracy Comparison 2013 2014 2015 2016 MAD MAD MAD MAD 2012. 20'16 MADMAD 476,961 262,564 1,030,642 406,9'lil 345,853 410,642 534,584 544,813 724,4'.14 902,087 MAPE MAPE MAPE MAPE MAPE 406,121 721,780 MAPE 3.60% 9.09% 2.610/o 4.540/o 3.09% 5.67Yo 4.05Yo 7.47Y0 4.50Yo 8.63% Bias Bias Bias Bias Bias 3.57Y0 7.080/o Bias 6 4 8 0 I -2 I 2 12 t0 1 0 46 4 5 6 7 8 a. Would you please explain the tables? A. The MAD refers to the mean absolute deviation and is a measure of forecast accuracy. The MAD is the average of the absolute value of the forecast errors. The smaller the MAD, the less the forecast value deviates from the true value in Fry, Reb. 23 Intermountain Gas Company 83,898 90,189 76,687 80,943 68,351 104,485 127,448 164,343 129,799 115,754 MAPE MAPE MAPE MAPE MAPE 97,236 '111,143 MAPE 5.21% 8.66% 6.88% 8.69% 5.440/o 12.190/o 12.79o/o 19.f 1o/o 8.04% 15.53% Bias Bias Bias Bias Bias 7.67Yo 12.960/o Bias 2 8 -4 -2 -4 6 -10 0 I6 .16 26 I 2 3 4 5 6 7 8 9 an absolute sense on average, so smaller MAD values are preferred. The MAPE is the mean absolute percentage error, which is an alternative measure of forecast error. It measures the average of the absolute value of the forecast effors as a percentage of the actual demand value. Smaller MAPE values are preferred. Finally, the bias is a measure of the tendency of the forecast model to over or under forecast. Positive measure of bias indicate that the forecast is less than the actual (i.e., under forecast). In terms of these measures of accuracy the Company's models generally outperformed the models proposed by Staff. With the exception of the MAD calculation for the RS-l model for 2016 and the Bias calculation for RS-2 in 2015 the Company's forecasts were superior to Dr. Morrison's forecasts in every year with respect to MAD, MAPE and Bias. Because much of your testimony is so highly technical, can you provide a brief summary, in laymen's terms regarding the Company's models? Yes. Dr. Shannon and I reviewed the Company's approach to developing its forecast. It is our opinionthat: o The Company's models were developed using variables that are theoretically relevant to modeling demand. o The Company's models are statistically significant. o The estimated coeffrcients in the Company's models are statistically significant with the expected signs. o The Company's models corrected for autocorrelated error terms. o The Company's models provided good forecast accuracy. Fry, Reb. 24 Intermountain Gas Company 10 11 t2 13 a. t4 15 A. t6 t7 l8 t9 20 2t 22 23 I 2 J 4 5 6 7 8 9 a. Do you have any final thoughts regarding the Company's models? A. Yes. As stated in the letter filed with this Case as Exhibit No. 18, it is Dr. Shannon's and my opinion that the methods the Company used in preparing its three statistical regression models are appropriate and are based on sound statistical methodology. We believe that the methods used by Intermountain are appropriate for weather normalization and that Intermountain's approach follows the methodology authorized by the IPUC in Case U-1034-134. a. Does this conclude your rebuttal testimony? A. Yes, thank you, it does. Fry, Reb. 25 Intermountain Gas Company