Explain Markov’s decision process.

1.	Explain Markov’s decision process.
Answer» Markov’s decision process (MDP) is a mathematical approach for reinforcement learning. Markov's decision process (MDP) is a mathematical framework used to solve problems where outcomes are partially random and partly controlled. To solve a complex problem using Markov’s decision process, the following basic things are needed- Agent- The agent is an entity that we are going to train. For example, a robot that is going to be trained to assist in cooking is an agent. Environment- The surroundings around the agent are called Environment. The kitchen is an environment in the case of the above-mentioned robot. State (S)- The current situation of the agent is called the state. So, in the case of the robot, the position where the robot is, the temperature of the robot, its posture, etc. collectively define the state of the robot. Action (A)- The robot can move left or right, or it can pass an ONION to the chef, these are some of the actions that the agent (robot) can take. Policy (𝜋)- The policy is the reasoning behind taking an action. Reward (R) -A reward is RECEIVED by the agent for taking a desirable action. Value (V)- The value is the POTENTIAL future reward that the agent can receive. The working of Markov’s model can be understood from the following diagram. In simple words, the agent has to do some action to start from its initial state. While doing so, it RECEIVES rewards based on the actions it takes. The policy defines the action it takes, and the reward collected defines the value (V).

Answer»

Markov’s decision process (MDP) is a mathematical approach for reinforcement learning. Markov's decision process (MDP) is a mathematical framework used to solve problems where outcomes are partially random and partly controlled. To solve a complex problem using Markov’s decision process, the following basic things are needed-

Agent- The agent is an entity that we are going to train. For example, a robot that is going to be trained to assist in cooking is an agent.
Environment- The surroundings around the agent are called Environment. The kitchen is an environment in the case of the above-mentioned robot.
State (S)- The current situation of the agent is called the state. So, in the case of the robot, the position where the robot is, the temperature of the robot, its posture, etc. collectively define the state of the robot.
Action (A)- The robot can move left or right, or it can pass an ONION to the chef, these are some of the actions that the agent (robot) can take.
Policy (𝜋)- The policy is the reasoning behind taking an action.
Reward (R) -A reward is RECEIVED by the agent for taking a desirable action.
Value (V)- The value is the POTENTIAL future reward that the agent can receive.

The working of Markov’s model can be understood from the following diagram.

In simple words, the agent has to do some action to start from its initial state. While doing so, it RECEIVES rewards based on the actions it takes. The policy defines the action it takes, and the reward collected defines the value (V).

Explain Markov’s decision process.

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment