The 10th European Workshop on Reinforcement Learning (EWRL 2012)
Dates: June 30 – July 1 2012 (2-days workshop @ ICML 2012)
Location: Edinburgh, Scotland (2-days ICML Workshop)
Post-Workshop Proceedings: JMLR W&C Proceedings, Vol. 24
Conference: June 30 – July 1 2012 (@ICML)
Shie Mannor (Technion)
Rich Sutton (University of Alberta)
Martin Riedmiller (University of Freiburg)
Drew Bagnell (Carnegie Mellon University)
Marc Deisenroth (TU Darmstadt)
Csaba Szepesvari (University of Alberta)
Jan Peters (TU Darmstadt)
Proceedings of the Tenth European Workshop on Reinforcement Learning
June, 2012, Edinburgh, Scotland
Editors: Marc Peter Deisenroth, Csaba Szepesvári, Jan Peters
We are calling for papers (and posters) from the entire reinforcement learning spectrum, with the option of either 2 page position papers (on which open discussion will be held) or longer 8 page JMLR format research papers. We encourage a range of submissions to encourage broad discussion. We will publish a selection of accepted papers in the prestigious JMLR W&C Proceedings.
Double submissions (e.g., with ICML) are OK. However in the event that an EWRL paper is accepted to another conference proceedings or journal, it will not be reprinted in the official EWRL proceedings (JMLR W&C). The paper would still be considered, however, for acceptance and presentation at EWRL regardless of whether it can be printed in the official proceedings. Double submissions must be clearly labelled as such (e.g., add a footnote on the first page). In case your ICML submission exceeds EWRL’s page limit, don’t worry too much about it: submit the ICML paper.
We will publish a selection of papers from EWRL 2012 in the JMLR Workshop & Conference Proceedings
- Page limit: 2 pages for short papers and 8 pages for regular papers (plus references).
- Paper format: JMLR W&C style
- Papers for the JMLR W&C Proceedings must be resubmitted after EWRL.
Details after EWRL.
Since EWRL 2012 is an ICML workshop, the ICML-workshop fees have to be paid. There won’t be any additional EWRL specific fees.
Registration via ICML Workshops
Appleton Tower, LT 1
The poster sessions will be in the atrium of the Appleton Tower.
Students can apply for financial support: Send an email to email@example.com explaining why and how much financial support is required.
Keynote Speakers’ Abstracts
Shie Mannor: Known Unknowns: Planning with Parameter Uncertainty
Planning when the model parameters are not fully known is a common problem encountered in operations research, control, and artificial intelligence. I will start with demonstrating why planning with parameter uncertainty is an important issue. I will then describe several approaches: Bayesian uncertainty model over the unknown parameters, a robust approach that takes a worst case view, and a frequentist approach. I will outline the advantages and disadvantages of each approach and discuss its potential to scale-up to large problems. I will finally discuss the challenges that are posed by a higher level of uncertainty, where the model itself rather than its parameters may not be fully known.
Martin Riedmiller: Neural Architectures for Real World Reinforcement Learning
The research focus of the Machine Learning Lab at the University of Freiburg lies in building intelligent control architectures that can
learn their behaviour entirely from scratch. Our aim is to build learning machines that perceive their environment, autonomously learn
to generate internal representations and autonomously learn to make appropriate decisions to finally reach a predefined goal.
In my talk I will provide examples of how neural network based
learning methods can be effectively applied to realize such control
architectures. As one example, I will present some recent results on
deep learning architectures for visual input based reinforcement
Richard Sutton: Verification in Artificial Intelligence
Drew Bagnell: Machine Learning with Multiple Guesses: Contextual Control Libraries
High-dimensional action spaces are an increasingly important in problems of reinforcement and imitation learning, robotics, and control.
A popular approach to managing such difficulties in robotics uses a library of candidate “maneuvers” or “trajectories”. The library is either evaluated on a fixed number of candidate choices at runtime (e.g. path set selection for planning) or by iterating through a sequence of feasible choices until success is achieved (e.g. grasp selection). The performance of the library relies heavily on the content and order of the sequence of candidates. We propose a provably efficient method to optimize such libraries leveraging recent advances in optimizing sequence sub-modular functions.
An alternate approach to such problems is to directly attempt to predict the correct control action in a learning based approach, attempting to bypass the evaluation of a tremendous number of choices. Such methods, however, have no way to recover if the prediction is not a good one.
In the second part of the talk, I will show an extension that yields a general approach to predict a sequence of potential actions based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each “slot” in the sequence. This approach can be thought of as capturing the notion of “predict then simulate”: checking multiple educated guesses in simulation and executing the most promising one. Finally we demonstrate the efficacy of the approaches on local trajectory optimization techniques, grasp library selection, and ground vehicle path set selection.
Joint work with Debadeepta Dey, Tommy Liu, and Martial Hebert.
Saturday (June 30)
|08:30 – 09:00||COFFEE for arrival|
|09:00 – 09:15||Welcome|
|09:15 – 10:10||Invited Talk: Shie Mannor (“Known Unknowns”)|
|10:10 – 10:30||Shiau Hong Lim and Peter Auer: Autonomous Exploration For Navigating In MDPs|
|10:30 – 11:00||COFFEE|
|11:00 – 11:15||Cosmin Paduraru, Doina Precup, Joelle Pineau and Gheorghe Comanici: A Study of Off-policy Learning in Computational Sustainability|
|11:15 – 11:30||Sergiu Goschin, Ari Weinstein, Michael Littman and Erick Chastain: Planning in Reward-Rich Domains via PAC Bandits|
|11:30 – 11:45||Michael Castronovo, Francis Maes, Raphael Fonteneau and Damien Ernst: Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning|
|11:45 – 12:00||Pedro Ortega and Daniel Alexander Braun: Free Energy and the Generalized Optimality Equations for Sequential Decision Making|
|12:00 – 12:15||Amir-Massoud Farahmand, Doina Precup and Mohammad Ghavamzadeh: Generalized Classification-based Approximate Policy Iteration|
|12:15 – 12:30||Nicolas Heess, David Silver and Yee Whye Teh: Actor-Critic Reinforcement Learning with Energy-Based Policies|
|12:30 – 14:00||LUNCH|
|14:00 – 14:50||Invited Talk: Martin Riedmiller (“Neural Architectures for Real World Reinforcement Learning”)|
|14:50 – 15:10||David Silver: Gradient Temporal Difference Networks|
|15:10 – 15:30||Marc Deisenroth and Jan Peters: Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise|
|15:30 – 16:00||COFFEE|
|16:00 – 17:30||Poster Session I|
Sunday (July 1)
|08:30 – 09:00||COFFEE for arrival|
|09:00 – 09:50||Invited Talk: Drew Bagnell (“Machine Learning with Multiple Guesses: Contextual Control Libraries”)|
|09:50 – 10:05||Nikos Vlassis, Michael Littman and David Barber: Stochastic POMDP controllers: How easy to optimize?|
|10:05 – 10:20||Hado van Hasselt: Pre-learning in Generalized MDPs to Speed up Learning|
|10:30 – 11:00||COFFEE|
|11:00 – 12:30||Poster Session II|
|12:30 – 14:00||LUNCH|
|14:00 – 14:50||Invited Talk: Richard Sutton (“Verification in Artificial Intelligence”)|
|14:50 – 15:10||Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux and Patrick Gallinari: Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization|
|15:10 – 15:25||Michal Valko, Mohammad Ghavamzadeh and Alessandro Lazaric: Semi-Supervised Inverse Reinforcement Learning|
|15:30 – 16:00||COFFEE|
|16:00 – 16:15||Abdeslam Boularias, Oliver Kroemer and Jan Peters: Structured Apprenticeship Learning|
|16:15 – 16:30||Mahdi Milani Fard, Yuri Grinberg, Joelle Pineau and Doina Precup: Bellman Error Based Feature Generation Using Random Projections|
|16:30 – 16:45||Edouard Klein, Bilal Piot, Matthieu Geist and Olivier Pietquin: Structured Classification for Inverse Reinforcement Learning|
|16:45 – 17:00||Jan Hendrik Metzen: Online Skill Discovery using Graph-based Clustering|
|17:00 – 17:15||Alborz Geramifard, Stefanie Tellex, David Wingate, Nicholas Roy and Jonathan How: A Bayesian Approach to Finding Compact Representations for Reinforcement Learning|
|17:15 – 17:30||Arthur Guez, David Silver and Peter Dayan: Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search|
|17:30 – 17:45||Closing Remarks|
Accepted Papers for Presentation at EWRL 2012
Nikos Vlassis, Michael Littman and David Barber:
Stochastic POMDP controllers: How easy to optimize?
Scaling life-long off-policy learning
A Dantzig Selector Approach to Temporal Difference Learning
Path Integral Policy Improvement with Covariance Matrix Adaptation
Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise
Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization
Online Skill Discovery using Graph-based Clustering
Semi-Supervised Inverse Reinforcement Learning
Direct Policy Search Reinforcement Learning based on Particle Filtering
L1 Regularized Gradient Temporal-Difference Learning
Bellman Error Based Feature Generation Using Random Projections
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Model-based Direct Policy Search for Skill Learning in Continuous Domains
Policy Gradients with Variance Related Risk Criteria
Active Preference-based Reinforcement Learning
Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments
Abstraction in Reinforcement Learning in Terms of Metastability
Feature Reinforcement Learning using Looping Suffix Trees
Structured Classification for Inverse Reinforcement Learning
Decoupling Exploration and Exploitation in Multi-Armed Bandits
A Study of Off-policy Learning in Computational Sustainability
Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
Structured Apprenticeship Learning
Hierarchical, hetereogenous Control using Reinforcement Learning
Rollout-based Game-tree Search Outprunes Traditional Alpha-beta
Planning in Reward-Rich Domains via PAC Bandits
A Bayesian Approach to Finding Compact Representations for Reinforcement Learning
Apprenticeship Learning for Model Parameters of Partially Observable Environments
On the Sample Complexity of Reinforcement Learning with a Generative Model
Actor-Critic Reinforcement Learning with Energy-Based Policies
Pre-learning in Generalized MDPs to Speed up Learning
Autonomous Exploration For Navigating In MDPs
Low Complexity Proto-Value Function Updating with Incremental Slow Feature Analysis
Directed Exploration in Reinforcement Learning with Transferred Knowledge
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
Generalized Classification-based Approximate Policy Iteration
Two-Manifold Problems with Applications to Nonlinear System Identification
Compositional Planning Using Optimal Option Models
Gradient Temporal Difference Networks
Discovering Continuous Homomorphisms for Transfer
Andre Damotta Salles Barreto
Carlos Diuk Wasser
Hado van Hasselt
Jose Antonio Martin H.
Martijn van Otterlo
Thomas J. Walsh
Javier Garcia Polo