EWRL10 (2012)

The 10th European Workshop on Reinforcement Learning (EWRL 2012)


Dates: June 30 – July 1 2012 (2-days workshop @ ICML 2012)
Location: Edinburgh, Scotland (2-days ICML Workshop)
Post-Workshop Proceedings: JMLR W&C Proceedings, Vol. 24

[dates] [submission] [committees] [keynotes] [papers] [registration] [venue] [schedule] [sponsors]


Important Dates

Conference: June 30 – July 1 2012 (@ICML)

Keynote Speakers

Shie Mannor (Technion)
Rich Sutton (University of Alberta)
Martin Riedmiller (University of Freiburg)
Drew Bagnell (Carnegie Mellon University)

Organizing Committee

Marc Deisenroth (TU Darmstadt)
Csaba Szepesvari (University of Alberta)
Jan Peters (TU Darmstadt)

Proceedings of the Tenth European Workshop on Reinforcement Learning

June, 2012, Edinburgh, Scotland

Editors: Marc Peter Deisenroth, Csaba Szepesvári, Jan Peters

Preface
Marc Peter Deisenroth, Csaba Szepesvári, Jan Peters
[pdf]
Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
Michael Castronovo, Francis Maes, Raphael Fonteneau, Damien Ernst ; JMLR W&C 24:1-10, 2012.
[abs]
[pdf]
Feature Reinforcement Learning using Looping Suffix Trees
Mayank Daswani, Peter Sunehag, Marcus Hutter ; JMLR W&C 24:11-24, 2012.
[abs]
[pdf]
Planning in Reward-Rich Domains via PAC Bandits
Sergiu Goschin, Ari Weinstein, Michael L. Littman, Erick Chastain ; JMLR W&C 24:25-42, 2012.
[abs]
[pdf]
Actor-Critic Reinforcement Learning with Energy-Based Policies
Nicolas Heess, David Silver, Yee Whye Teh ; JMLR W&C 24:43-58, 2012.
[abs]
[pdf]
Directed Exploration in Reinforcement Learning with Transferred Knowledge
Timothy A. Mann, Yoonsuck Choe ; JMLR W&C 24:59-76, 2012.
[abs]
[pdf]
Online Skill Discovery using Graph-based Clustering
Jan Hendrik Metzen ; JMLR W&C 24:77-88, 2012.
[abs]
[pdf]
An Empirical Analysis of Off-policy Learning in Discrete MDPs
Cosmin Păduraru, Doina Precup, Joelle Pineau, Gheorghe Comănici ; JMLR W&C 24:89-102, 2012.
[abs]
[pdf]
Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments
Yevgeny Seldin, Csaba Szepesvári, Peter Auer, Yasin Abbasi-Yadkori ; JMLR W&C 24:103-116, 2012.
[abs]
[pdf]
Gradient Temporal Difference Networks
David Silver ; JMLR W&C 24:117-130, 2012.
[abs]
[pdf]
Semi-Supervised Apprenticeship Learning
Michal Valko, Mohammad Ghavamzadeh, Alessandro Lazaric ; JMLR W&C 24:131-142, 2012.
[abs]
[pdf]
An investigation of imitation learning algorithms for structured prediction
Andreas Vlachos ; JMLR W&C 24:143-154, 2012.
[abs]
[pdf]
Rollout-based Game-tree Search Outprunes Traditional Alpha-beta
Ari Weinstein, Michael L. Littman, Sergiu Goschin ; JMLR W&C 24:155-167, 2012.
[abs]
[pdf]


Paper Submission

We are calling for papers (and posters) from the entire reinforcement learning spectrum, with the option of either 2 page position papers (on which open discussion will be held) or longer 8 page JMLR format research papers. We encourage a range of submissions to encourage broad discussion. We will publish a selection of accepted papers in the prestigious JMLR W&C Proceedings.

Double submissions (e.g., with ICML) are OK. However in the event that an EWRL paper is accepted to another conference proceedings or journal, it will not be reprinted in the official EWRL proceedings (JMLR W&C). The paper would still be considered, however, for acceptance and presentation at EWRL regardless of whether it can be printed in the official proceedings. Double submissions must be clearly labelled as such (e.g., add a footnote on the first page). In case your ICML submission exceeds EWRL’s page limit, don’t worry too much about it: submit the ICML paper.

We will publish a selection of papers from EWRL 2012 in the JMLR Workshop & Conference Proceedings

  • Page limit: 2 pages for short papers and 8 pages for regular papers (plus references).
  • Paper format: JMLR W&C style
  • Papers for the JMLR W&C Proceedings must be resubmitted after EWRL.
    Details after EWRL.


Registration

Since EWRL 2012 is an ICML workshop, the ICML-workshop fees have to be paid. There won’t be any additional EWRL specific fees.
Registration via ICML Workshops


Workshop Venue

Appleton Tower, LT 1

The poster sessions will be in the atrium of the Appleton Tower.


Scholarships

Students can apply for financial support: Send an email to marc@ias.tu-darmstadt.de explaining why and how much financial support is required.


Keynote Speakers’ Abstracts

Shie Mannor: Known Unknowns: Planning with Parameter Uncertainty
Planning when the model parameters are not fully known is a common problem encountered in operations research, control, and artificial intelligence. I will start with demonstrating why planning with parameter uncertainty is an important issue. I will then describe several approaches: Bayesian uncertainty model over the unknown parameters, a robust approach that takes a worst case view, and a frequentist approach. I will outline the advantages and disadvantages of each approach and discuss its potential to scale-up to large problems. I will finally discuss the challenges that are posed by a higher level of uncertainty, where the model itself rather than its parameters may not be fully known.

Martin Riedmiller: Neural Architectures for Real World Reinforcement Learning
The research focus of the Machine Learning Lab at the University of Freiburg lies in building intelligent control architectures that can
learn their behaviour entirely from scratch. Our aim is to build learning machines that perceive their environment, autonomously learn
to generate internal representations and autonomously learn to make appropriate decisions to finally reach a predefined goal.

In my talk I will provide examples of how neural network based
learning methods can be effectively applied to realize such control
architectures. As one example, I will present some recent results on
deep learning architectures for visual input based reinforcement
learning.

Richard Sutton: Verification in Artificial Intelligence

Drew Bagnell: Machine Learning with Multiple Guesses: Contextual Control Libraries
High-dimensional action spaces are an increasingly important in problems of reinforcement and imitation learning, robotics, and control.
A popular approach to managing such difficulties in robotics uses a library of candidate “maneuvers” or “trajectories”. The library is either evaluated on a fixed number of candidate choices at runtime (e.g. path set selection for planning) or by iterating through a sequence of feasible choices until success is achieved (e.g. grasp selection). The performance of the library relies heavily on the content and order of the sequence of candidates. We propose a provably efficient method to optimize such libraries leveraging recent advances in optimizing sequence sub-modular functions.

An alternate approach to such problems is to directly attempt to predict the correct control action in a learning based approach, attempting to bypass the evaluation of a tremendous number of choices. Such methods, however, have no way to recover if the prediction is not a good one.
In the second part of the talk, I will show an extension that yields a general approach to predict a sequence of potential actions based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each “slot” in the sequence. This approach can be thought of as capturing the notion of “predict then simulate”: checking multiple educated guesses in simulation and executing the most promising one. Finally we demonstrate the efficacy of the approaches on local trajectory optimization techniques, grasp library selection, and ground vehicle path set selection.

Joint work with Debadeepta Dey, Tommy Liu, and Martial Hebert.


Workshop Schedule

Saturday (June 30)

08:30 – 09:00 COFFEE for arrival
09:00 – 09:15 Welcome
09:15 – 10:10 Invited Talk: Shie Mannor (“Known Unknowns”)
10:10 – 10:30 Shiau Hong Lim and Peter Auer: Autonomous Exploration For Navigating In MDPs
10:30 – 11:00 COFFEE
11:00 – 11:15 Cosmin Paduraru, Doina Precup, Joelle Pineau and Gheorghe Comanici: A Study of Off-policy Learning in Computational Sustainability
11:15 – 11:30 Sergiu Goschin, Ari Weinstein, Michael Littman and Erick Chastain: Planning in Reward-Rich Domains via PAC Bandits
11:30 – 11:45 Michael Castronovo, Francis Maes, Raphael Fonteneau and Damien Ernst: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
11:45 – 12:00 Pedro Ortega and Daniel Alexander Braun: Free Energy and the Generalized Optimality Equations for Sequential Decision Making
12:00 – 12:15 Amir-Massoud Farahmand, Doina Precup and Mohammad Ghavamzadeh: Generalized Classification-based Approximate Policy Iteration
12:15 – 12:30 Nicolas Heess, David Silver and Yee Whye Teh: Actor-Critic Reinforcement Learning with Energy-Based Policies
12:30 – 14:00 LUNCH
14:00 – 14:50 Invited Talk: Martin Riedmiller (“Neural Architectures for Real World Reinforcement Learning”)
14:50 – 15:10 David Silver: Gradient Temporal Difference Networks
15:10 – 15:30 Marc Deisenroth and Jan Peters: Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise
15:30 – 16:00 COFFEE
16:00 – 17:30 Poster Session I
18:30 – Banquet

Sunday (July 1)

08:30 – 09:00 COFFEE for arrival
09:00 – 09:50 Invited Talk: Drew Bagnell (“Machine Learning with Multiple Guesses: Contextual Control Libraries”)
09:50 – 10:05 Nikos Vlassis, Michael Littman and David Barber: Stochastic POMDP controllers: How easy to optimize?
10:05 – 10:20 Hado van Hasselt: Pre-learning in Generalized MDPs to Speed up Learning
10:30 – 11:00 COFFEE
11:00 – 12:30 Poster Session II
12:30 – 14:00 LUNCH
14:00 – 14:50 Invited Talk: Richard Sutton (“Verification in Artificial Intelligence”)
14:50 – 15:10 Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux and Patrick Gallinari: Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization
15:10 – 15:25 Michal Valko, Mohammad Ghavamzadeh and Alessandro Lazaric: Semi-Supervised Inverse Reinforcement Learning
15:30 – 16:00 COFFEE
16:00 – 16:15 Abdeslam Boularias, Oliver Kroemer and Jan Peters: Structured Apprenticeship Learning
16:15 – 16:30 Mahdi Milani Fard, Yuri Grinberg, Joelle Pineau and Doina Precup: Bellman Error Based Feature Generation Using Random Projections
16:30 – 16:45 Edouard Klein, Bilal Piot, Matthieu Geist and Olivier Pietquin: Structured Classification for Inverse Reinforcement Learning
16:45 – 17:00 Jan Hendrik Metzen: Online Skill Discovery using Graph-based Clustering
17:00 – 17:15 Alborz Geramifard, Stefanie Tellex, David Wingate, Nicholas Roy and Jonathan How: A Bayesian Approach to Finding Compact Representations for Reinforcement Learning
17:15 – 17:30 Arthur Guez, David Silver and Peter Dayan: Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
17:30 – 17:45 Closing Remarks


Accepted Papers for Presentation at EWRL 2012

Nikos Vlassis, Michael Littman and David Barber:
Stochastic POMDP controllers: How easy to optimize?

Adam White, Joseph Modayil and Richard Sutton:
Scaling life-long off-policy learning

Matthieu Geist, Bruno Scherrer, Alessandro Lazaric and Mohammad Ghavamzadeh:
A Dantzig Selector Approach to Temporal Difference Learning

Freek Stulp and Olivier Sigaud:
Path Integral Policy Improvement with Covariance Matrix Adaptation

Michael Castronovo, Francis Maes, Raphael Fonteneau and Damien Ernst:
Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning

Marc Deisenroth and Jan Peters:
Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise

Shie Mannor, Ofir Mebel and Huan Xu:
Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux and Patrick Gallinari:
Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

Jan Hendrik Metzen:
Online Skill Discovery using Graph-based Clustering

Michal Valko, Mohammad Ghavamzadeh and Alessandro Lazaric:
Semi-Supervised Inverse Reinforcement Learning

Petar Kormushev and Darwin Caldwell:
Direct Policy Search Reinforcement Learning based on Particle Filtering

Dominik Meyer, Hao Shen and Klaus Diepold:
L1 Regularized Gradient Temporal-Difference Learning

Andreas Vlachos:
An investigation of imitation learning algorithms for structured prediction

Mahdi Milani Fard, Yuri Grinberg, Joelle Pineau and Doina Precup:
Bellman Error Based Feature Generation Using Random Projections

Arthur Guez, David Silver and Peter Dayan:
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Jan Hendrik Metzen:
Model-based Direct Policy Search for Skill Learning in Continuous Domains

Dotan Di Castro, Aviv Tamar and Shie Mannor:
Policy Gradients with Variance Related Risk Criteria

Riad Akrour, and Marc Schoenauer:
Active Preference-based Reinforcement Learning

Yevgeny Seldin, Peter Auer, Yasin Abbasi-Yadkori and Csaba Szepesvári:
Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments

Vimal Mathew, Peeyush Kumar and Balaraman Ravindran:
Abstraction in Reinforcement Learning in Terms of Metastability

Mayank Daswani, Peter Sunehag and Marcus Hutter:
Feature Reinforcement Learning using Looping Suffix Trees

Edouard Klein, Bilal Piot, Matthieu Geist and Olivier Pietquin:
Structured Classification for Inverse Reinforcement Learning

Orly Avner, Shie Mannor and Ohad Shamir:
Decoupling Exploration and Exploitation in Multi-Armed Bandits

Cosmin Paduraru, Doina Precup, Joelle Pineau and Gheorghe Comanici:
A Study of Off-policy Learning in Computational Sustainability

Marek Petrik:
Approximate Dynamic Programming By Minimizing  Distributionally Robust Bounds

Abdeslam Boularias, Oliver Kroemer and Jan Peters:
Structured Apprenticeship Learning

Ekaterina Abramova, Luke Dickens, Daniel Kuhn and A. Aldo Faisal:
Hierarchical, hetereogenous Control using Reinforcement Learning

Ari Weinstein, Michael Littman and Sergiu Goschin:
Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

Sergiu Goschin, Ari Weinstein, Michael Littman and Erick Chastain:
Planning in Reward-Rich Domains via PAC Bandits

Alborz Geramifard, Stefanie Tellex, David Wingate, Nicholas Roy and Jonathan How:
A Bayesian Approach to Finding Compact Representations for Reinforcement Learning

Takaki Makino and Johane Takeuchi:
Apprenticeship Learning for Model Parameters of Partially Observable Environments

Mohammad Gheshlaghi Azar, Remi Munos and Bert Kappen:
On the Sample Complexity of Reinforcement Learning with a Generative Model

Nicolas Heess, David Silver and Yee Whye Teh:
Actor-Critic Reinforcement Learning with Energy-Based Policies

Hado van Hasselt:
Pre-learning in Generalized MDPs to Speed up Learning

Shiau Hong Lim and Peter Auer:
Autonomous Exploration For Navigating In MDPs

Matthew Luciw and Juergen Schmidhuber:
Low Complexity Proto-Value Function Updating with Incremental Slow Feature Analysis

Timothy Mann and Yoonsuck Choe:
Directed Exploration in Reinforcement Learning with Transferred Knowledge

Pedro Ortega and Daniel Alexander Braun:
Free Energy and the Generalized Optimality Equations for Sequential Decision Making

Amir-Massoud Farahmand, Doina Precup and Mohammad Ghavamzadeh:
Generalized Classification-based Approximate Policy Iteration

Byron Boots and Geoffrey Gordon:
Two-Manifold Problems with Applications to Nonlinear System Identification

David Silver and Kamil Ciosek:
Compositional Planning Using Optimal Option Models

David Silver:
Gradient Temporal Difference Networks

Arun Tejasvi Chaganty and Balaraman Ravindran:
Discovering Continuous Homomorphisms for Transfer


Program Committee

Abdeslam Boularias
Adam White
Alborz Geramifard
Alessandro Lazaric
Amir-massoud Farahmand
Andre Damotta Salles Barreto
Andrew McHutchon
Bert Kappen
Bradley Knox
Byron Boots
Carlos Diuk Wasser
Christian Daniel
Christian Igel
Csaba Szepesvari
Damien Ernst
David Silver
Doina Precup
Dvijotham Krishnamurthy
Emma Brunskill
Evangelos Theodorou
Fernand Fernandez
Francisco Melo
Gerhard Neumann
Hado van Hasselt
Jan Peters
Jens Kober
Jose Antonio Martin H.
Jun Morimoto
Katharina Mülling
Kristian Kersting
Manuel Lopes
Marc Deisenroth
Marco Wiering
Martijn van Otterlo
Martin Riedmiller
Masashi Sugiyama
Matthew Hoffman
Matthew Robards
Matthieu Geist
Michal Valko
Mohammad Ghavamzadeh
Nikos Vlassis
Odalric-Ambrym Maillard
Oliver Kroemer
Olivier Pietquin
Pedro Ortega
Peter Auer
Peter Dayan
Peter Sunehag
Philipp Hennig
Philippe Preux
Remi Munos
Ronald Ortner
Shivaram Kalyanakrishnan
Stephane Ross
Teodor Moldovan
Thomas Furmston
Thomas J. Walsh
Thomas Rückstieß
Tobias Jung
Tobias Lang
Todd Hester
Tom Erez
Tom Schaul
Verena Heidrich-Meisner
Yuri Grinberg
Zhikun Wang
Zico Kolter

Additional Reviewers

Christoph Dann
Javier Garcia Polo


Sponsors


%d bloggers like this: