Deep RL

In a simplified sense, deep RL can be thought of as snapping a deep neural architecture onto whatever we use to learn – similar to the leap from logistic regression, to the modern-day deep nets like CNNs, for the same task of classification. But beneath the simple “tool-box change” lies all sorts of new challenges, solutions, and insights – after all, deep learning is itself a sort of standalone subject (and we do have a class on that at EECS; shout out to Philip!)

Some considerations when choosing a deep RL algo

  • Sample complexity
  • To what extend do we make use of a model (emphasize the model here differs from the hypothesis model; in RL and control context, model typically refers to the dynamics model. the usual ML hypothesis or model is typically a policy, or a value function)
  • Whether the dynamics model is super discrete.

Deep value-based methods

deep Q learning

Deep policy-based methods

TRPO, PPO