Overfitting

PrepNuggets

LEVEL II

Overfitting is a common problem in machine learning when a model is trained to fit the training data too closely and therefore, it performs poorly on unseen data. This occurs when a model has too many parameters and is able to memorize the training data instead of generalizing to new data. As a result, the model has high accuracy on the training set but low accuracy on out-of-sample data set.

Overfitting can be prevented by complexity reduction, where a penalty value is imposed for every feature used by the model. This forces the model to only include features that reduce the out-of-sample error.