Multicollinearity occurs when two or more independent variables in a multiple regression model are highly correlated with each other. This can create problems when interpreting the regression coefficients, as the estimated coefficients of the correlated variables can change erratically in response to small changes in the data or the model.
There are several ways to detect multicollinearity in a regression model, one way is to use a correlation matrix to check the correlation coefficients between the independent variables. A correlation coefficient of 0.8 or higher is often used as a rule of thumb to indicate multicollinearity.
Another way to detect multicollinearity is to use the Variance Inflation Factor (VIF) which is a measure of how much the variance of an estimated regression coefficient is increased due to multicollinearity in the model. A VIF of 1 indicates no multicollinearity, while a VIF greater than 1 indicates that the corresponding variable is correlated with one or more other independent variables. A VIF value of 5 or more is often considered to indicate high multicollinearity.
Multicollinearity can cause several problems in interpreting the regression analysis results. When multicollinearity is present, the coefficients of the correlated independent variables will be less stable and less reliable than when there is no multicollinearity. Also the precision of the regression coefficients are usually low, which means that the confidence intervals will be wider and can’t really be used to make any meaningful conclusions.
One way to deal with multicollinearity is by removing one or more correlated independent variables from the model or combine them in a way to form a new variable with lower correlation. Another way is to use dimensionality reduction techniques like PCA.
See also: Heteroskedasticity, Serial correlation