print("Linear Regression")
Aloha ๐ฝ,
In our last chapter, Introduction to Supervised Learning Linear Regression, We discussed and learned the idea behind Linear Regression in supervised learning tasks.
In this chapter, we'll put our knowledge into practically building our first linear regression models. We'll build simple and multiple linear regression models so we can see how they both compare.
It'll be mostly hands-on, so let's get right into it! ๐
Playground
Play play time ๐ฎ retzam-ai.vercel.app. In this chapter, we trained a model to predict the price of a house. We used Linear Regression to train Simple and Multiple Linear Regression models respectively!
You can check it directly here in the playground to predict the prices of houses.
It is pretty straightforward you'll need to enter some details about the house as shown below:
The model would make predictions based on the house details you've entered and give you a price for the house.
Each model would give its prediction, and also the prediction score for the model performance, as shown below:
You can play around with it, and revise classification projects as well ๐ค
Hands-On
We'll use Python for the hands-on section, so you'll need to have a little bit of Python programming experience. If you are not too familiar with Python, still try, the comments are very explicit and detailed.
We'll use Google Co-laboratory as our code editor, it is easy to use and requires zero setup.Hereisan article to get started.
Here is a link to our organization on GitHub, github.com/retzam-ai, you can find the code for all the models and projects we work on. We are excited to see you contribute to our project repositories ๐ค.
For this demo project, we used a dataset of about 50,000 house data from Kaggle, check it out here. We'll train a model that would predict the price of a house.
For the complete code for this tutorial check this pdf here.
Data Preprocessing
Create a new Colab notebook.
Go to Kaggle here, to download the house's dataset.
Add the dataset to the project folder, as shown by the left bar
Import the dataset using pandas
We plot a histogram to check which features affect the outcome the most or the least. This helps us determine, which features to use in training our model and the ones to discard.
We then split our dataset into training, validation, and test sets in a 60%-20%-20% format.
We then process the data to make it ready for our tasks, by scaling and fitting them into the right shape for our linear regression model. We do this for the training, validation and tests set. The pdf mentioned at the start has more detailed comments on what the get_xy function does.
Train model
We don't need to create our Linear Regression models ourselves so we'll use one already made and widely used from Scikit learn: LinearRegression.
So we'll import it and train our model with our training dataset.
First, in Simple Linear Regression, we use the square feet feature like so:
Then the Multiple Linear Regression, using all the features:
Performance Review
First, we'll need to make predictions with our newly trained model using our test dataset.
Then we'll use the prediction to compare with the actual output/target on the test dataset to measure the performance of our model.
Simple Linear Regression:
Multiple Linear Regression:
We can see that simple regression using only 1 feature (square feet) gave us a score of 56%, while multiple regression using all the feature vectors gives us a score of 100% which is brilliant right? Yea ๐ค
Check the playground and compare the model's scores and predictions. To get a feel of your good work ๐.
End of hands-on
There we go!
We have just completed our machine learning lessons for supervised learning for both classification and regression โ . Ciao well done!
Congratulations we've learned, built, and deployed 20+ WORKING MACHINE LEARNING MODELS! ๐ค.
Next, we'll talk about something different, Unsupervised Learning! ๐ฝ
Let's keep running ๐๐ผ.