Semi-supervised learning is a branch of machine learning that uses both labelled and unlabelled data to train models. It uses both supervised and unsupervised learning methods to train AI models.
We have learned in-depth about supervised and unsupervised learning, so this would be a short chapter. Semi-supervised learning is used in situations where unlabelled data are abundant but labelled data are few.
Since we already know about labelled and unlabelled data, let us learn how semi-supervised learning works in training a model.
Semi-Supervised Learning Training
Consider receiving a task to train a model to predict COVID-19 carriers, and you were given 2,000 labelled data and 100,000 unlabelled data on patients who have the disease and those who do not.
This is how you would go about it using semi-supervised learning.
First train the model with the labelled data: So in our case, we'll train the model with the 2,000 labelled data to predict COVID-19 carriers.
Secondly, use the train model to predict labels for the unlabelled data: Here we'll use the model we've created in (1) to predict labels for the 100,000 unlabelled data.
Thirdly the most confident/accurate predictions are added to the labelled data: This means after our predictions we'll subtract the most accurate from the unlabelled data to the labelled data. So of the 100,000 unlabelled data maybe 25,000 were labelled correctly by our model, we'll therefore add it to the 2,000 to make 27,000.
Iteratively repeat (2) and (3) until you have predicted all the unlabeled data: Here we'll train the model again with the 27,000 labelled data. Then make predictions on the remaining 75,000 unlabelled data. We'll do this continuously till all the unlabelled data is labelled.
The semi-supervised method we used here is called Self-Training.
By the end of these steps, we would have trained a model with semi-supervised learning. You can use any supervised learning model like K Nearest Neighbours, Random Forests, Naive Bayes etc.
There are other methods used in training models using semi-supervised in addition to Self-Training such as Co-Training and Graph-Based-Training, you can read more here.
In this chapter we used our previous knowledge of supervised and unsupervised learning to understand semi-supervised learning, isn't that great? I think it is! 🤗, means we are growing.
In our next chapter, we'll learn about a pivotal topic in modern Machine Learning, Re-enforcement Learning.
See ya! 👽