Hola, Welcome back!
Reinforcement learning (RL) is a branch of machine learning where an agent learns to make decisions by performing certain actions and receiving rewards or penalties as feedback. The goal of the agent is to learn a strategy(policy) to maximize the cumulative results.
Agents in RL refer to the entity that makes decisions, takes actions, and learns from the environment.
RL is based primarily on three characteristics—the agent's actions influence its later inputs, not having direct instructions as to what actions to take, and where the consequences of actions, including rewards and penalties, play out over extended time periods—are the three most important distinguishing features of reinforcement learning problems.
How Re-enforcement Learning Works.
RL agent starts with no knowledge of its environment with a default state, it interacts with the environment by making decisions(actions) for which it is rewarded or penalized, finally, it updates its state(policy) based on the reward/penalty. The agent does this repetitively till it figures out a way to interact with its environment to get the maximum reward possible.
Re-enforcement Learning is the closest form of machine learning to how humans learn. From childbirth, we learn and develop by interacting with our environment and getting rewarded for it or penalized for our actions. This in turn determines how we turn out to be.
A very good example of re-enforcement learning is the training of dogs. When the dog does exactly what the trainer tells it to do, the dog is given a treat. If the dog fails it is not given anything. If the dog does someone bad it is punished. This makes the dog learn to do the things that give it treats and ignore the things that don't give anything or punishment.
There are two major types of RL
Model-based RL: The agent first builds an internal representation (model) of the environment.
This is typically used when environments are well-defined and unchanging and where real-world environment testing is difficult.
Model-free RL: The agent learns a policy without explicitly modeling the environment.
This is best to use when the environment is large, complex, and not easily describable so difficult to model.
Applications of RL can be seen in self-driving cars, and robotics to teach things like walking, running grasping as in the popular Boston Dynamic's Spot you can watch a review here, etc.
We'll implement Re-enforcement Learning Agents later on in this series, for this chapter, we just want to know the basics.
Materials
To go deeper and more advanced into Re-enforcement learning you can read this awesome book by Richard S. Sutton and Andrew G. Barto, here.
You can watch an interesting video illustration of an agent that is trying to learn how to walk using RL, here.
The end
Re-enforcement learning forms the core of most of what modern AI does and intends to do in the future. I hope you enjoyed this one!
We have learned a lot, and prepared ourselves deeply enough now to take on the biggest deal in AI in our next chapter: Deep Learning!
The wait is over and we are ready!
See you in the next one! 👽