Influence Analysis

PrepNuggets

LEVEL II

In regression analysis, influence analysis is a method used to identify which observations in a dataset have a disproportionate effect on the estimated regression coefficients. This can help to identify outliers or observations that are having a large effect on the overall regression model. There are several measures that can be used to determine the influence of an observation, such as the leverage of an observation, the Studentized residual, and the Cook’s distance.

Leverage measures how far an observation is from the average value of the independent variable. High leverage observations are those that are far away from the average and have a large effect on the regression line.

Studentized residual is a measure of how far an observation is from the fitted regression line, taking into account the uncertainty in the estimate of the regression coefficients. Large positive or negative studentized residuals indicate observations that are far from the fitted line and may be influential.

Cook’s distance is a measure of the effect of deleting an observation on the estimated coefficients. It takes into account both the leverage and the residual of the observation. High Cook’s distance indicates that the observation has a large effect on the estimated coefficients when it’s deleted.

The influence plot can summarise the 3 metrics for all data points in one single glance. The X-axis plots the leverage, which is normalised between 0 and 1, the Y-axis plots the studentized residuals, which can be positive or negative. The size of the circles for each data point reflect its Cook’s distance.