Being able to start Deep reinforcement-learning research

Being able to start Deep reinforcement-learning engineering role

Understand modern state-of-the-art Deep reinforcement-learning knowledge

Understand Deep reinforcement-learning knowledge

## Requirements

- Interest in Deep reinforcement-learning

## Description

Hello I am Nitsan Soffair, A Deep RL researcher at BGU.

In my Deep reinforcement-learning course you will learn the **newest** **state-of-the-art** Deep reinforcement-learning knowledge.

You will do the following

- Get
**state-of-the-art knowledge**regarding- Model types
- Algorithms and approaches
- Function approximation
- Deep reinforcement-learning
- Deep Multi-agent Reinforcement-learning

**Validate**your knowledge by answering short and very short quizzes of each lecture.- Be able to complete the course by ~
**2 hours**.

**Syllabus**

- Model types
- Markov decision process (MDP)A discrete-time stochastic control process.
- Partially observable Markov decision process (POMDP)A generalization of MDP in which an agent cannot observe the state.
- Decentralized Partially observable Markov decision process (Dec-POMDP)A generalization of POMDP to consider multiple decentralized agents.

- Algorithms and approaches
- Bellman equationsA condition for optimality of optimization of dynamic programming.
- Model-freeA model-free algorithm is an algorithm which does not use the policy of the MDP.
- Off-policyAn off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.
- Exploration-exploitationA trade-off in Reinforcement-learning between exploring new policies to use existing policies.
- Value-iterationAn iterative algorithm applying bellman optimality backup.
- SARSAAn algorithm for learning a Markov decision process policy
- Q-learningA model-free reinforcement learning algorithm to learn the value of an action in a particular state.

- Function approximation
- Function approximatorsThe problem asks us to select a function among a well-defined class that closely matches (“approximates”) a target function in a task-specific way.
- Policy-gradientValue-based, Policy-based, Actor-critic, policy-gradient, and softmax policy
- REINFORCEA policy-gradient algorithm.

- Deep reinforcement-learning
- Deep Q-Network (DQN)A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.
- Deep Recurrent Q-Learning (DRQN)Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.
- Optimistic Exploration with Pessimistic Initialization (OPIQ)A deep reinforcement-learning for MDP based on DQN.
- Value Decomposition Networks (VDN)A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- QMIXA multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- QTRANA multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- Weighted QMIXA deep multi-agent reinforcement-learning for Dec-POMDP.

**Resources**

- Wikipedia
- David Silver’s Reinforcement-learning course

## Who this course is for:

- Anyone who interests in Deep reinforcement-learning