The ANOVA (Analysis of Variance) table is a statistical tool used to determine if the regression model is significantly better than just predicting the mean of the dependent variable in a simple linear regression study. It is created by organizing the results of various calculations into a table with the following columns: Source of variation, Sum of Squares, Degrees of Freedom, and Mean Squares.
The Source column identifies the source of the variance in the data. In a simple linear regression study, there are two sources of variance: the model (referred to as “Regression”) and the error (referred to as “Error”).
The Sum of Squares column lists the sum of squared differences for each source of variance. The regression sum of squares (RSS) represents the sum of the squared differences between the predicted values and the mean of the dependent variable. The sum of squared errors (SSE) represents the sum of the squared differences between the observed values and the predicted values of the dependent variable. The total sum of squares (SST) represents the sum of the squared differences between the observed values and the mean of the dependent variable.
The Degrees of Freedom column represents the number of observations that are free to vary. In a simple linear regression study, the degrees of freedom for the model is equal to the number of independent variables minus 1. The degrees of freedom for the error is equal to the total number of observations minus the number of independent variables.
The Mean Square (MS) column represents the average squared differences for each source of variance. The regression mean squared error (MSR) is calculated by dividing the SSR by the degrees of freedom for the model. The mean squared error (MSE) is calculated by dividing the SSE by the degrees of freedom for the error.
The F-statistic represents the ratio of the MSR to the MSE. A high F-statistic indicates that the regression model is significantly better than just predicting the mean of the dependent variable.
The ANOVA table is used to determine if the regression model is a significant improvement over just predicting the mean of the dependent variable. If the F-statistic is significantly large, it indicates that the regression model is significantly better than just predicting the mean. If the F-statistic is not significantly large, it indicates that the regression model is not significantly better than just predicting the mean.