Reinforcement Learning is learning what to do — how to map situations to actions — so as to maximize a numerical reward signal. A learning agent can take actions that affect the state of the environment and have goals relating to the state of the environment.
There are some amazing answers.
Suppose you have a dog that is not so well trained. Every time the dog messes up the living room you reduce the amount of tasty foods you give it (punishment) and every time it behaves well you double the tasty snacks (reward). What will the dog eventually learn? Well, that messing up the living room is bad.
This simple concept is powerful. The dog is the agent, the living room the environment, you are the source of the reward signal (tasty snacks). You are giving feedback to the dog. But this feedback is vague it doesn’t mean anything without the context. So eventually the dog’s neural networks figure out the relationship between the tasty snacks and good behavior.
So in order for the dog to maximize the goal of eating more tasty snacks, it will simply behave well. And never to mess with the living room again. So you can apply RL to non-computer related problems, such as this dog-living room example. Every biological entity has reinforcement learning (RL) built in, humans, cats and many more use it. That is why RL, if solved, can be a very powerful tool for artificial intelligence (AI) applications in fields like self-driving cars.
So in Reinforcement Learning
we want to mimic the behavior of biological entities. A robot can be the agent and the goal for it will be to find the best way to move from one place in the house to the other without hitting into obstacles. So it is important to define a score, hit an obstacle and get a negative score (punishment), avoid an obstacle and get a positive score (reward). And the more distance it covers the more the reward. So feedback can come from multiple sources. The goal is to maximize the overall perceive score in every case.
The agent can always act on the environment. But it needs to find the best sets of actions to act on the environment in order to maximize that reward. This is why RL is important for self-adapting systems. Such as in AlphaGo, after a supervise phase of learning, AlphaGo play against it’s earlier self using RL to further improve on it’s own.
Robotic control systems can learn, using RL, how to move the robot arm in order to pick up objects for example. They can learn to move around the environment about object avoidance using RL. They can learn a multitude of control tasks this way, such as balancing.
RL can also be useful in game playing agents. Given the controls, the game environment and the score. The goal is to maximize the score and RL can help the agent figure out which action patterns lead to the best score. It may not be the best solution. But can be good enough and can almost always become better with more iterations.
There are many applications of RL and since deep learning (DL) is becoming more mainstream. There is now heavy research in deep RL such as at DeepMind. It is for training a variety of game playing agents as a way to get to artificial general intelligence (AGI).
So RL makes it possible to define vague goals and let the agent learn on it’s own by observing and acting on the environment to get the feedback. It is the route to AGI but it is currently notoriously hard to train these systems, we have much to learn from the dog I guess :).
Hope this helps.