EWRL10 (2012)
The 10th European Workshop on Reinforcement Learning (EWRL 2012)
Dates: June 30 – July 1 2012 (2days workshop @ ICML 2012)
Location: Edinburgh, Scotland (2days ICML Workshop)
PostWorkshop Proceedings: JMLR W&C Proceedings, Vol. 24
[dates] [submission] [committees] [keynotes] [papers] [registration] [venue] [schedule] [sponsors]
Important Dates
Conference: June 30 – July 1 2012 (@ICML)
Keynote Speakers
Shie Mannor (Technion)
Rich Sutton (University of Alberta)
Martin Riedmiller (University of Freiburg)
Drew Bagnell (Carnegie Mellon University)
Organizing Committee
Marc Deisenroth (TU Darmstadt)
Csaba Szepesvari (University of Alberta)
Jan Peters (TU Darmstadt)
Proceedings of the Tenth European Workshop on Reinforcement Learning
June, 2012, Edinburgh, Scotland
Editors: Marc Peter Deisenroth, Csaba Szepesvári, Jan Peters

Paper Submission
We are calling for papers (and posters) from the entire reinforcement learning spectrum, with the option of either 2 page position papers (on which open discussion will be held) or longer 8 page JMLR format research papers. We encourage a range of submissions to encourage broad discussion. We will publish a selection of accepted papers in the prestigious JMLR W&C Proceedings.
Double submissions (e.g., with ICML) are OK. However in the event that an EWRL paper is accepted to another conference proceedings or journal, it will not be reprinted in the official EWRL proceedings (JMLR W&C). The paper would still be considered, however, for acceptance and presentation at EWRL regardless of whether it can be printed in the official proceedings. Double submissions must be clearly labelled as such (e.g., add a footnote on the first page). In case your ICML submission exceeds EWRL’s page limit, don’t worry too much about it: submit the ICML paper.
We will publish a selection of papers from EWRL 2012 in the JMLR Workshop & Conference Proceedings
 Page limit: 2 pages for short papers and 8 pages for regular papers (plus references).
 Paper format: JMLR W&C style
 Papers for the JMLR W&C Proceedings must be resubmitted after EWRL.
Details after EWRL.
Registration
Since EWRL 2012 is an ICML workshop, the ICMLworkshop fees have to be paid. There won’t be any additional EWRL specific fees.
Registration via ICML Workshops
Workshop Venue
Appleton Tower, LT 1
The poster sessions will be in the atrium of the Appleton Tower.
Scholarships
Students can apply for financial support: Send an email to marc@ias.tudarmstadt.de explaining why and how much financial support is required.
Keynote Speakers’ Abstracts
Shie Mannor: Known Unknowns: Planning with Parameter Uncertainty
Planning when the model parameters are not fully known is a common problem encountered in operations research, control, and artificial intelligence. I will start with demonstrating why planning with parameter uncertainty is an important issue. I will then describe several approaches: Bayesian uncertainty model over the unknown parameters, a robust approach that takes a worst case view, and a frequentist approach. I will outline the advantages and disadvantages of each approach and discuss its potential to scaleup to large problems. I will finally discuss the challenges that are posed by a higher level of uncertainty, where the model itself rather than its parameters may not be fully known.
Martin Riedmiller: Neural Architectures for Real World Reinforcement Learning
The research focus of the Machine Learning Lab at the University of Freiburg lies in building intelligent control architectures that can
learn their behaviour entirely from scratch. Our aim is to build learning machines that perceive their environment, autonomously learn
to generate internal representations and autonomously learn to make appropriate decisions to finally reach a predefined goal.
In my talk I will provide examples of how neural network based
learning methods can be effectively applied to realize such control
architectures. As one example, I will present some recent results on
deep learning architectures for visual input based reinforcement
learning.
Richard Sutton: Verification in Artificial Intelligence
Drew Bagnell: Machine Learning with Multiple Guesses: Contextual Control Libraries
Highdimensional action spaces are an increasingly important in problems of reinforcement and imitation learning, robotics, and control.
A popular approach to managing such difficulties in robotics uses a library of candidate “maneuvers” or “trajectories”. The library is either evaluated on a fixed number of candidate choices at runtime (e.g. path set selection for planning) or by iterating through a sequence of feasible choices until success is achieved (e.g. grasp selection). The performance of the library relies heavily on the content and order of the sequence of candidates. We propose a provably efficient method to optimize such libraries leveraging recent advances in optimizing sequence submodular functions.
An alternate approach to such problems is to directly attempt to predict the correct control action in a learning based approach, attempting to bypass the evaluation of a tremendous number of choices. Such methods, however, have no way to recover if the prediction is not a good one.
In the second part of the talk, I will show an extension that yields a general approach to predict a sequence of potential actions based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reductionbased approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each “slot” in the sequence. This approach can be thought of as capturing the notion of “predict then simulate”: checking multiple educated guesses in simulation and executing the most promising one. Finally we demonstrate the efficacy of the approaches on local trajectory optimization techniques, grasp library selection, and ground vehicle path set selection.
Joint work with Debadeepta Dey, Tommy Liu, and Martial Hebert.
Workshop Schedule
Saturday (June 30)
08:30 – 09:00  COFFEE for arrival 
09:00 – 09:15  Welcome 
09:15 – 10:10  Invited Talk: Shie Mannor (“Known Unknowns”) 
10:10 – 10:30  Shiau Hong Lim and Peter Auer: Autonomous Exploration For Navigating In MDPs 
10:30 – 11:00  COFFEE 
11:00 – 11:15  Cosmin Paduraru, Doina Precup, Joelle Pineau and Gheorghe Comanici: A Study of Offpolicy Learning in Computational Sustainability 
11:15 – 11:30  Sergiu Goschin, Ari Weinstein, Michael Littman and Erick Chastain: Planning in RewardRich Domains via PAC Bandits 
11:30 – 11:45  Michael Castronovo, Francis Maes, Raphael Fonteneau and Damien Ernst: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning 
11:45 – 12:00  Pedro Ortega and Daniel Alexander Braun: Free Energy and the Generalized Optimality Equations for Sequential Decision Making 
12:00 – 12:15  AmirMassoud Farahmand, Doina Precup and Mohammad Ghavamzadeh: Generalized Classificationbased Approximate Policy Iteration 
12:15 – 12:30  Nicolas Heess, David Silver and Yee Whye Teh: ActorCritic Reinforcement Learning with EnergyBased Policies 
12:30 – 14:00  LUNCH 
14:00 – 14:50  Invited Talk: Martin Riedmiller (“Neural Architectures for Real World Reinforcement Learning”) 
14:50 – 15:10  David Silver: Gradient Temporal Difference Networks 
15:10 – 15:30  Marc Deisenroth and Jan Peters: Solving Nonlinear Continuous StateActionObservation POMDPs for Mechanical Systems with Gaussian Noise 
15:30 – 16:00  COFFEE 
16:00 – 17:30  Poster Session I 
18:30 –  Banquet 
Sunday (July 1)
08:30 – 09:00  COFFEE for arrival 
09:00 – 09:50  Invited Talk: Drew Bagnell (“Machine Learning with Multiple Guesses: Contextual Control Libraries”) 
09:50 – 10:05  Nikos Vlassis, Michael Littman and David Barber: Stochastic POMDP controllers: How easy to optimize? 
10:05 – 10:20  Hado van Hasselt: Prelearning in Generalized MDPs to Speed up Learning 
10:30 – 11:00  COFFEE 
11:00 – 12:30  Poster Session II 
12:30 – 14:00  LUNCH 
14:00 – 14:50  Invited Talk: Richard Sutton (“Verification in Artificial Intelligence”) 
14:50 – 15:10  Gabriel DulacArnold, Ludovic Denoyer, Philippe Preux and Patrick Gallinari: Fast Reinforcement Learning with Large Action Sets using ErrorCorrecting Output Codes for MDP Factorization 
15:10 – 15:25  Michal Valko, Mohammad Ghavamzadeh and Alessandro Lazaric: SemiSupervised Inverse Reinforcement Learning 
15:30 – 16:00  COFFEE 
16:00 – 16:15  Abdeslam Boularias, Oliver Kroemer and Jan Peters: Structured Apprenticeship Learning 
16:15 – 16:30  Mahdi Milani Fard, Yuri Grinberg, Joelle Pineau and Doina Precup: Bellman Error Based Feature Generation Using Random Projections 
16:30 – 16:45  Edouard Klein, Bilal Piot, Matthieu Geist and Olivier Pietquin: Structured Classification for Inverse Reinforcement Learning 
16:45 – 17:00  Jan Hendrik Metzen: Online Skill Discovery using Graphbased Clustering 
17:00 – 17:15  Alborz Geramifard, Stefanie Tellex, David Wingate, Nicholas Roy and Jonathan How: A Bayesian Approach to Finding Compact Representations for Reinforcement Learning 
17:15 – 17:30  Arthur Guez, David Silver and Peter Dayan: Efficient BayesAdaptive Reinforcement Learning using SampleBased Search 
17:30 – 17:45  Closing Remarks 
Accepted Papers for Presentation at EWRL 2012
Nikos Vlassis, Michael Littman and David Barber:
Stochastic POMDP controllers: How easy to optimize?
Scaling lifelong offpolicy learning
A Dantzig Selector Approach to Temporal Difference Learning
Path Integral Policy Improvement with Covariance Matrix Adaptation
Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
Solving Nonlinear Continuous StateActionObservation POMDPs for Mechanical Systems with Gaussian Noise
Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
Fast Reinforcement Learning with Large Action Sets using ErrorCorrecting Output Codes for MDP Factorization
Online Skill Discovery using Graphbased Clustering
SemiSupervised Inverse Reinforcement Learning
Direct Policy Search Reinforcement Learning based on Particle Filtering
L1 Regularized Gradient TemporalDifference Learning
Bellman Error Based Feature Generation Using Random Projections
Efficient BayesAdaptive Reinforcement Learning using SampleBased Search
Modelbased Direct Policy Search for Skill Learning in Continuous Domains
Policy Gradients with Variance Related Risk Criteria
Active Preferencebased Reinforcement Learning
Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments
Abstraction in Reinforcement Learning in Terms of Metastability
Feature Reinforcement Learning using Looping Suffix Trees
Structured Classification for Inverse Reinforcement Learning
Decoupling Exploration and Exploitation in MultiArmed Bandits
A Study of Offpolicy Learning in Computational Sustainability
Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
Structured Apprenticeship Learning
Hierarchical, hetereogenous Control using Reinforcement Learning
Rolloutbased Gametree Search Outprunes Traditional Alphabeta
Planning in RewardRich Domains via PAC Bandits
A Bayesian Approach to Finding Compact Representations for Reinforcement Learning
Apprenticeship Learning for Model Parameters of Partially Observable Environments
On the Sample Complexity of Reinforcement Learning with a Generative Model
ActorCritic Reinforcement Learning with EnergyBased Policies
Prelearning in Generalized MDPs to Speed up Learning
Autonomous Exploration For Navigating In MDPs
Low Complexity ProtoValue Function Updating with Incremental Slow Feature Analysis
Directed Exploration in Reinforcement Learning with Transferred Knowledge
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
Generalized Classificationbased Approximate Policy Iteration
TwoManifold Problems with Applications to Nonlinear System Identification
Compositional Planning Using Optimal Option Models
Gradient Temporal Difference Networks
Discovering Continuous Homomorphisms for Transfer
Program Committee
Abdeslam Boularias
Adam White
Alborz Geramifard
Alessandro Lazaric
Amirmassoud Farahmand
Andre Damotta Salles Barreto
Andrew McHutchon
Bert Kappen
Bradley Knox
Byron Boots
Carlos Diuk Wasser
Christian Daniel
Christian Igel
Csaba Szepesvari
Damien Ernst
David Silver
Doina Precup
Dvijotham Krishnamurthy
Emma Brunskill
Evangelos Theodorou
Fernand Fernandez
Francisco Melo
Gerhard Neumann
Hado van Hasselt
Jan Peters
Jens Kober
Jose Antonio Martin H.
Jun Morimoto
Katharina Mülling
Kristian Kersting
Manuel Lopes
Marc Deisenroth
Marco Wiering
Martijn van Otterlo
Martin Riedmiller
Masashi Sugiyama
Matthew Hoffman
Matthew Robards
Matthieu Geist
Michal Valko
Mohammad Ghavamzadeh
Nikos Vlassis
OdalricAmbrym Maillard
Oliver Kroemer
Olivier Pietquin
Pedro Ortega
Peter Auer
Peter Dayan
Peter Sunehag
Philipp Hennig
Philippe Preux
Remi Munos
Ronald Ortner
Shivaram Kalyanakrishnan
Stephane Ross
Teodor Moldovan
Thomas Furmston
Thomas J. Walsh
Thomas Rückstieß
Tobias Jung
Tobias Lang
Todd Hester
Tom Erez
Tom Schaul
Verena HeidrichMeisner
Yuri Grinberg
Zhikun Wang
Zico Kolter
Additional Reviewers
Christoph Dann
Javier Garcia Polo