Overfitting and Underfitting are common problems in machine learning and can impact the performance of a model. Knowing about these concepts is essential for building effective machine-learning models.
Before learning about overfitting and underfitting you should know about training and test datasets.
Training and Test Datasets
Before building a machine-learning model it is common to split the whole dataset into two sub-datasets: Training data and Test data.
The training data as the name suggests is used to train the machine-learning model to find the patterns and relationships in the data. The trained model is then used on the test dataset to make the predictions.
In short, training data is used to train the model while the test data is used to evaluate the performance of the trained data.
Training and Test Dataset
Overfitting and Underfitting
Overfitting occurs when the model is very complex and fits the training data very closely. This will result in poor generalization of the model. This means the model performs well on training data but it will not be able to predict accurate outcomes for new, unseen data.
Underfitting occurs when a model is too simple and is unable to properly capture the patterns and relationships in the data. This means the model will perform poorly on both the training and the test data.
Image Source: GeeksforGeeks
The aim of the machine-learning model should be to get good training and good test accuracy.
What causes Overfitting and Underfitting?
Overfitting is often caused by using a model with too many parameters or if the model is too powerful for the given dataset. On the other hand, underfitting is often caused by the model with too few parameters or by using a model that is not powerful enough for the given dataset.
Bias and Variance
Bias and Variance are two errors that can severely impact the performance of the machine-learning model.
If a model has a very good training accuracy it means the model has low variance but if the training accuracy is bad then the model has high variance. If the model has bad test accuracy then it has a high variance but if the test accuracy is good this means the model has low variance.
Overfitting
Underfitting
How to avoid Overfitting and Underfitting?
- To solve the issue of overfitting and underfitting, it is important to choose an appropriate model for the given dataset.
- Hyper-performance tuning can also be performed.
- For overfitting reducing the model complexity can help similarly for underfitting the model complexity can be increased.
- As overfitting is caused due to too many features in the dataset and underfitting is caused by too few features so during feature engineering the numbers of features can be decreased and increased to avoid overfitting and underfitting respectively.
Conclusion
Overfitting and Underfitting are two very common issues in machine learning. Both overfitting and underfitting can impact the model's performance. Overfitting occurs when the model is complex and fits the data closely while underfitting occurs when the model is too simple and is unable to find relationships and patterns accurately. It is very important to recognise both these issues while building the model and deal with them to improve its performance of the model.
Thanks for reading this article! Leave a comment below if you have any questions. You can follow me on Linkedin and GitHub.