Basics of GLM Diagnostics, Validation, and Factor Analysis: Practice Questions and Solutions

The Actuary's Free Study Guide for Exam 5 - Section 82

G. Stolyarov II
This section of sample problems and solutions is a part of The Actuary's Free Study Guide for Exam 5, authored by Mr. Stolyarov. This is Section 82 of the Study Guide. See an index of all sections by following the link in this paragraph.

This section of the study guide is intended to provide practice problems and solutions to accompany the pages of Basic Ratemaking, cited below. Students are encouraged to read these pages before attempting the problems. This study guide is entirely an independent effort by Mr. Stolyarov and is not affiliated with any organization(s) to whose textbooks it refers, nor does it represent such organization(s).

Some of the questions here ask for short written answers based on the reading. This is meant to give the student practice in answering questions of the format that will appear on Exam 5. Students are encouraged to type their own answers first and then to compare these answers with the solutions given here. Please note that the solutions provided here are not necessarily the only possible ones.

Source:
Werner, Geoff and Claudine Modlin. Basic Ratemaking. Casualty Actuarial Society. 2009. Chapter 10, pp. 176-181.

Original Problems and Solutions from The Actuary's Free Study Guide

Problem S5-82-1.

(a) For generalized linear models (GLMs), what does the standard error diagnostic measure?

(b) Place values of Variable X, in sequence from lowest to highest, on the horizontal axis of a graph and indicated relativities for Variable X on the vertical axis. Which of the following results with respect to GLM indications and standard error would be most effective in demonstrating that higher values of Variable X correspond to a higher risk of risk of insurance loss?

(i) The curve connecting the points on the graph indicated by the GLM is positively sloped and has wide bands around it, corresponding to ± 2 standard errors.
(ii) The curve connecting the points on the graph indicated by the GLM is negatively sloped and has wide bands around it, corresponding to ± 2 standard errors.
(iii) The curve connecting the points on the graph indicated by the GLM is positively sloped and has narrow bands around it, corresponding to ± 2 standard errors.
(iv) The curve connecting the points on the graph indicated by the GLM is negatively sloped and has narrow bands around it, corresponding to ± 2 standard errors.

Solution S5-82-1.

(a) The standard error provides a measurement of confidence around the indicated GLM values. ± 2 standard errors from the GLM indication are akin to a 95% confidence interval estimate (Werner and Modlin, p. 176).

(b) The correct answer is (iii) The curve connecting the points on the graph indicated by the GLM is positively sloped and has narrow bands around it, corresponding to ± 2 standard errors. This indicates that there is an indication that higher values of Variable X correspond to a higher risk of risk of insurance loss, and that there is a substantial degree of confidence in this result, as the equivalent of a 95% confidence interval is quite narrow.

Problem S5-82-2.

(a) For generalized linear models (GLMs), what does the deviance diagnostic measure?

(b) Name three examples of deviance diagnostics.

(c) Describe a practical diagnostic pertaining to GLMs that is neither an instance of standard error measurement nor of deviance.

Solution S5-82-2. This question is based on the discussion in Werner and Modlin, p. 177.

(a) The deviance diagnostic measures "how much the fitted values differ from the observations." It is frequently used to evaluate whether it would be useful to include additional variables in the model.

(b) Four examples of deviance diagnostics are (1) the Chi-Square test, (2) the F-test, (3) the Akaike Information Criteria (AIC), and (4) the Bayesian Information Criteria (BIC).

(c) A practical diagnostic pertaining to GLMs is the comparison of results for individual years to see how well the model performs from one year to the next and whether the model is a reliable predictor of subsequent years' data. One can be more confident in a model's predictive abilities if it continues to largely reflect observed results for many years.

Problem S5-82-3.

(a) What is a holdout sample of data, and how can it be used in GLM validation?

(b) If the model's treatment of certain variables is a result of over-fitting, how would the holdout sample reflect this? What flaw in the model's design does the existence of over-fitting suggest?

(c) If the model's treatment of certain variables is a result of under-fitting, how would the holdout sample reflect this? What flaw in the model's design does the existence of under-fitting suggest?

Solution S5-82-3. This question is based on the discussion in Werner and Modlin, p. 178.

(a) A holdout sample of data consists of historical data taken from the same time period as the data used in the development of the model; however, data in the holdout sample are not used in creating the model itself. Rather, they are used to test the model once it has been created, to see whether the model can accurately predict the distribution of data within the holdout sample. If the model's prediction corresponds closely to the actual composition of the holdout sample, then this constitutes evidence in support of the model's predictive power.

(b) If there is substantial over-fitting in the model, then the observed distribution of data in the holdout sample will differ substantially from the distribution of data predicted by the model. Over-fitting indicates that the modeler mistook "noise" within the data sample used to develop the model for "signal" - i.e., for systematic trends in the data. Since the same "noise" is unlikely to be reflected in other data samples, a substantial disparity will occur between model results and the actual holdout sample.

(c) The disparity between the model's predictions and the holdout sample results will not be as great if there is under-fitting; indeed, an under-fitted model might accurately predict the distribution of the variables in question within the holdout sample, as well as in future time periods. However, there will be less substance to the prediction than might be hoped for, as an under-fitted model has difficulty explaining what is responsible for differential experience within the data.

Problem S5-82-4. Name three aspects of GLM development that are still the responsibility of the individual actuary, no matter what sophisticated programs and other tools are at that actuary's disposal.

Solution S5-82-4. The following aspects of actuarial responsibility with regard to GLM development are discussed by Werner and Modlin, p. 180:

1. "Ensuring data is adequate for the level of detail of the classification ratemaking analysis (avoiding what is known as the GIGO principle: Garbage In, Garbage Out)";
2. "Identifying when anomalous results dictate additional exploratory analysis";
3. "Reviewing model results in consideration of both statistical theory and business application";
4. "Developing appropriate methods to communicate model results in light of a company's ratemaking objectives (e.g., policyholder dislocation, competitive position)";
5. Addressing IT constraints;
6. Addressing the insurer's marketing objectives;
7. Addressing regulatory requirements.

Any three of the above suffice as an answer. Other valid answers may be possible.

Problem S5-82-5. What is the purpose of factor analysis? How does it accomplish this purpose?

Solution S5-82-5. This question is based on the discussion of factor analysis by Werner and Modlin, pp. 180-181.

The purpose of factor analysis is to reduce the number of variables (parameter estimates) in a model such as a GLM. If two variables exhibit an exposure correlation or an interaction, factor analysis proceeds by determining the relationship between the variables (for instance, via a regression procedure) and then combining the two variables into a single variable that accounts for this relationship. The benefit of factor analysis is that the model utilizes fewer correlated variables and that each variable will convey information that does not overlap to as great an extent with the information conveyed by the other variables.

See other sections of The Actuary's Free Study Guide for Exam 5.

Published by G. Stolyarov II

G. Stolyarov II is a science fiction novelist, independent essayist, poet, amateur mathematician, composer, author, and actuary.  View profile

To comment, please sign in to your Yahoo! account, or sign up for a new account.