Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. MDP is an extension of Markov Reward Process with Decision (policy), that is in each time step, the Agent will have several actions to … The action for the agent is the dynamic load. Markov Decision Process (MDP) is a mathematical framework to describe an environment in reinforcement learning. The idea is to control the temperature of a room within the specified temperature limits. A real valued reward function R(s,a). The function p controls the dynamics of the process. Markov decision process • A finite set of states ! Markov Decision Process. The following figure shows agent-environment interaction in MDP: More specifically, the agent and the environment interact at each discrete time step, t = 0, 1, 2, 3…At each time step, the agent gets information about the environment state S t. This article was published as a part of the Data Science Blogathon. A key question is – how is RL different from supervised and unsupervised learning? The following figure shows agent-environment interaction in MDP: More specifically, the agent and the environment interact at each discrete time step, t = 0, 1, 2, 3…At each time step, the agent gets information about the environment state St. Based on the environment state at instant t, the agent chooses an action At. The state variable St contains the present as well as future rewards. The state is the input for policymaking. • Find a best policy ':→ such that • ∈ (0,1) is a discount factor max π vπ = +π t=0 γtr(s t,a t) We call if "tabular MDP" if there is no structural knowledge at all. The temperature inside the room is influenced by external factors such as outside temperature, the internal heat generated, etc. Supervised learning tells the user/agent directly what action he has to perform to maximize the reward using a training dataset of labeled examples. Let S, A, and R be the sets of states, actions, and rewards. 