When fitting an econometric model, it is well known that we pick up part of the idiosyncratic characteristics of the data along with the systematic relationship between dependent and explanatory variables. This phenomenon is known as overfitting and generally occurs when a model is excessively complex relative to the amount of data available. Overfitting is a major threat to regression analysis in terms of both inference and prediction.
We start by showing that the Copas measure becomes confounded by shrinkage or expansion arising from in-sample bias when applied to the untransformed scale of nonlinear models, which is typically the scale of interest when assessing behaviors or analyzing policies. We then propose a new measure of overfitting that is both expressed on the scale of interest and immune to this problem. We also show how to measure the respective contributions of in-sample bias and overfitting to the overall predictive bias when applying an estimated model to new data.
We finally illustrate the properties of our new measure through both a simulation study and a real-data illustration based on inpatient healthcare expenditure data, which shows that the distinctions can be important. Copyright © 2013 John Wiley & Sons, Ltd.