ICML 2008

Stochastic Optimal Control   -   ICML 2008 tutorial

to be held on Saturday July 5 2008 in Helsinki, Finland,
as part of the 25th International Conference on Machine Learning (ICML 2008)

Bert Kappen, Radboud University, Nijmegen, the Netherlands
Marc Toussaint, Technical University, Berlin, Germany

Stochastic optimal control theory concerns the problem of how to act optimally when reward is only obtained at a later time. The stochastic optimal control problem is central to modeling intelligent behaviour in animals or machines. Examples are control of multi-joint robot arms, navigation of vehicles, coordination of multi-agent systems. In addition, control theory plays an important role in financial applications.

Currently, the dominant approach to the above problems within the Machine learning community is Reinforcement Learning or (Partially Observable) Markov Decision Processes and often uses discounted reward. One can view these approaches as special cases of stochastic control theory.

The tutorial is introductory and aimed at the 'average' machine learning researcher. No background in control theory and/or reinforcement learning is assumed. A basic understanding of Bayesian networks and statistical inference is assumed.


  • Deterministic optimal control (Kappen, 30 min.)
    • Introduction of optimal control problems, types of control problems
    • Dynamic programming solution and deterministic Bellman equation
    • Discrete and continuous time formulation
    • Pontryagin minimum principle
    • Examples
  • Stochastic optimal control, discrete case (Toussaint, 40 min.)
    • Why stochasticity?
    • Markov Decision Processes
    • Stochastic Bellman optimality equation
    • Dynamic Programming, Value Iteration
    • Learning from experience: Temporal Difference, Q-learning, eligibilities, Exploration-Exploitation, Bayesian RL
  • coffee break
  • Stochastic optimal control, continuous case (Kappen, 40 min.)
    • Stochastic differential equations
    • Stochastic optimal control, Hamilton-Jacobi-Bellman equation
    • Linear Quadratic control, Ricatti equation
    • learning, inference and control, certainty equivalence
    • Path integral control
    • Coordination of agents, mapping to graphical model inference
  • Research issues (Toussaint, 30 min.)
    • Challenges in stochastic optimal control
    • Probabilistic inference approach to optimal control
    • Examples: POMDPs, robotic motion control and planning
    • Model learning in robotics

Tutorial manuscript

The tutorial slides can be accessed here: And here are accompanying manuscipts: Please see also the additional web material referred to below.

Tutorial demo code

Kappen: Matlab code for n joint problem
Here is a directory of matlab files, which allows you to run and inspect the variational approximation for the n joint stochastic control problem as discussed in the tutorial text section 6.7. Type tar xvf njoints.tar to unpack the directory and simply run file1.m. In file1.m you can select demo1 (3 joint arm) or demo2 (10 joint arm). You can also try larger n but be sure to adjust eta for the smoothing of the variational fixed point equations. You can compare the results with exact cmputation (only recommendable for 2 joints) by setting METHOD='exact'. There is also an implementation of importance sampling (does not work very well) and Metropolis Hastings sampling (works nice, but not as stable as the variational approximation).

Useful web material

Organizers & presenters:

Bert Kappen, b.kappen@science.ru.nl
Marc Toussaint, mtoussai@cs.tu-berlin.de