EWRL12 (2015)

The 12th European Workshop on Reinforcement Learning (EWRL 2015)

Dates:    10 – 11 July 2015

Location:  Lille, France (2-day ICML Workshop)

[description] [keynotes] [schedule] [papers] [submission] [dates] [sponsors] [committee] [registration] [venue] [photos]

Description

The 12th European workshop on reinforcement learning (EWRL 2015) invites reinforcement learning researchers to participate in the revival of this world class event. We plan to make this an exciting event for researchers worldwide, not only for the presentation of top quality papers, but also as a forum for ample discussion of open problems and future research directions. EWRL 2015 will consist of five keynote talkscontributed paper presentations, discussion sessions spread over a two day period.

Reinforcement learning is an active field of research which deals with the problem of sequential decision making in unknown (and often) stochastic and/or partially observable environments. Recently there has been a wealth of both impressive empirical results, as well as significant theoretical advances. Both types of advances are of significant importance and we would like to create a forum to discuss such interesting results.

The workshop will cover a range of sub-topics including (but not limited to):

  • Exploration/Exploitation and multi-armed bandit
  • Function approximation and large scale RL
  • Theoretical aspects of RL
  • Policy search methods
  • Actor-critic methods
  • Online learning methods
  • Adversarial RL
  • Risk-sensitive RL
  • Transfer and multi-task RL
  • Empirical evaluations in RL
  • Partial observable RL
  • Imitation learning and inverse RL
  • Bayesian RL
  • Multi-agent RL
  • Knowledge Representation in RL
  • Applications of RL
  • Open problems


Keynote Speakers

  • Marcus Hutter – Australian National University – Canberra, Australia
    • TitleUniversal Reinforcement Learning
    • Abstract: There is great interest in understanding and constructing generally intelligent systems approaching and ultimately exceeding human intelligence. Universal AI is such a mathematical theory of machine super-intelligence. More precisely, AIXI is an elegant parameter-free theory of an optimal reinforcement learning agent embedded in an arbitrary unknown environment that possesses essentially all aspects of rational intelligence. The theory reduces all conceptual AI problems to pure computational questions. After a brief discussion of its philosophical, mathematical, and computational ingredients, I will give a formal definition and measure of intelligence, which is maximized by AIXI.
      AIXI can be viewed as the most powerful Bayes-optimal sequential decision maker, for which I will present general optimality results. This also motivates some variations such as knowledge-seeking and optimistic agents, and feature reinforcement learning. Finally I present some recent approximations, implementations, and applications of this modern top-down approach to AI.
  • Thomas G. Diettrerich – Oregon State University – Corvallis, Oregon, USA
    • TitleEfficient Sampling for Simulator-Defined MDPs
    • Abstract: Extended value iteration can compute confidence intervals on the action values of an MDP based on samples from that MDP.  Different confidence interval methods (e.g., Hoeffding bound, Empirical Bernstein Bound, Weissman et al. L1 multinomial interval, etc.) at each state lead to different confidence intervals throughout the MDP. This talk will address two questions. First, given a strong simulator for an MDP, a fixed policy, and a sampling budget, what is the best way to draw samples in order to obtain the tightest bound on the value of the policy in the start state?  That is, what combination of confidence interval method and sampling strategy will give the tightest bounds?  Second, how should we draw samples in order to simultaneously optimize the policy and obtain the tightest bounds on the resulting policy?  Again, what confidence interval method and what sampling strategy should we use. I will present experiments that suggest partial answers to both questions.
  • David Silver – Google Deep Mind – London, UK
  • Lihong Li – Microsoft Research
    • TitleThe Online Discovery Problem and Its Application to Lifelong Reinforcement Learning
    • Abstract: Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning. Despite much encouraging empirical evidence that shows benefits of transfer, there has been very little theoretical analysis. In this paper, we study a class of lifelong reinforcement-learning problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action spaces and different transition/reward functions. Inspired by the need for cross-task exploration in lifelong learning, we formulate a novel online discovery problem and give an optimal learning algorithm to solve it. Such results allow us to develop a new lifelong reinforcement-learning algorithm, whose overall sample complexity in a sequence of tasks is much smaller than that of single-task learning, with high probability, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in a simulated problem.
  • Shie Mannor – Technion
    • TitleRisk in RL: Nothing ventured nothing gained
    • Abstract: We consider the role risk plays in dynamic decision problems. Different risk conscious criteria such as mean-variance tradeoffs, conditional value at risk, semi-deviation, exponential utility and others, have been studied in the RL/ADP criteria by us and others. We explain the complexity and simulation issues involved in evaluation and optimizing these risk measures. Our main theme is that considering risk is essential to obtain resilience to model uncertainty and even model mismatch. We propose a scheme we call “risk shaping”: an approach to modify the risk criterion to be optimized in such a way that best matches the overall task in hand.
  • Csaba Szepesvari – University of Alberta
    • TitleLazy Posterior Sampling for Parametric Nonlinear Control


Tentative Workshop Schedule

Download the EWRL schedule in pdf format.

DAY 1

8:30 Coffee and Welcome
8:50 Invited Talk. Thomas G. Diettrerich, “Efficient Sampling for Simulator-Defined MDPs”
10:00 Coffee Break
10:30 Matteus Tanha, Tse-Han Huang, Geoffrey J. Gordon and David J. Yaron, “Imitation Learning for Accelerating Iterative Computation of Fixed Points”
10:50 Matteo Pirotta and Marcello Restelli, “On the Minimization of the Policy Gradient in Inverse Reinforcement Learning”
11:10 Philip Bachman and Doina Precup, “Learning Policies for Data Imputation with Guided Policy Search”
11:30 Herke Van Hoof, Jan Peters and Gerhard Neumann, “Non-Parametric Policy Learning for High-Dimensional State Representations” 
12:00 Lunch
14:00 Invited Talk. Marcus Hutter
15:00 Jan Leike and Marcus Hutter, “On the Optimality of General Reinforcement Learners”
15:20 Ashique Rupam Mahmood, Huizhen Yu, Martha White and Richard Sutton, “Emphatic Temporal-Difference Learning”
15:40 Ludovic Denoyer and Patrick Gallinari, “Deep Sequential Neural Networks”
16:00 Poster session. “An Empirical Evaluation of True-Online TD(lambda)”, “On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence”, “Reinforced Decision Trees”, “A Reinforcement Learning Approach to Online Learning of Decision Trees”, “Contextual Markov Decision Processes”, “Off-policy Model-based Learning under Unknown Factored Dynamics”, “Using PCA to Efficiently Represent State Spaces”“A multiplicative UCB strategy for Gamma rewards”
17:00 Invited Talk. Shie Mannor

DAY 2

8:30 Tutorial. Marc Bellemare, “The Atari Learning Environment”
9:00 Invited Talk. David Silver
10:00 Coffee Break
10:30 Orly Avner and Shie Mannor, “Learning to coordinate without communication in multi-user multi-armed bandit problems”
10:50 Aristide Tossou and Christos Dimitrakakis, “Differentially private multi-agent multi-armed bandits”
11:10 Yahel David and Nahum Shimkin, “PAC Algorithms for the Infinitely-Many Armed Problem with Multiple Pools”
11:30 Gergely Neu, “Explore no more: Simple and tight high-probability bounds for non-stochastic bandits”
12:00 Lunch
14:00 Invited Talk. Lihong Li
15:00 Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh and Shie Mannor, “Policy Gradient for Coherent Risk Measures”
15:20 Diederik Roijers, Shimon Whiteson, Peter Vamplew and Richard Dazeley, “Why Multi-objective Reinforcement Learning?”
15:40 Christian Wirth and Gerhard Neumann, “Model-Free Preference-based Reinforcement Learning”
16:00 Poster Session. “Dueling Bandits as a Partial Monitoring Game”, “Multi-Armed Bandit for Pricing”, “Parallel Reinforcement Learning with State Action Space Partitioning”, Generalized Advantage Estimation for Policy Gradients”, “Sample-based abstraction for hybrid relational MDPs”
17:00 Invited Talk. Csaba Szepesvari

List of Accepted Contributions

  • Diederik Roijers, Shimon Whiteson, Peter Vamplew and Richard Dazeley. Why Multi-objective Reinforcement Learning?
  • Matthieu Geist. A multiplicative UCB strategy for Gamma rewards
  • Assaf Hallak, Dotan Di Castro and Shie Mannor. Contextual Markov Decision Processes
  • Assaf Hallak, Francois Schnitzler, Timothy Mann and Shie Mannor. Off-policy Model-based Learning under Unknown Factored Dynamics
  • Herke Van Hoof, Jan Peters and Gerhard Neumann. Non-Parametric Policy Learning for High-Dimensional State Representations
  • Ludovic Denoyer and Patrick Gallinari. Deep Sequential Neural Networks
  • Christian Wirth and Gerhard Neumann. Model-Free Preference-based Reinforcement Learning
  • Pratik Gajane and Tanguy Urvoy. Dueling Bandits as a Partial Monitoring Game
  • Harm van Seijen, Rich Sutton, Rupam Mahmood and Patrick Pilarski. An Empirical Evaluation of True-Online TD(lambda)
  • Davide Nitti, Vaishak Belle, Tinne De Laet and Luc De Raedt. Sample-based abstraction for hybrid relational MDPs
  • Orly Avner and Shie Mannor. Learning to coordinate without communication in multi-user multi-armed bandit problems
  • Patrick Mannion, Jim Duggan and Enda Howley. Parallel Reinforcement Learning with State Action Space Partitioning
  • Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh and Shie Mannor. Policy Gradient for Coherent Risk Measures
  • John Schulman, Philipp Moritz, Sergey Levine and Pieter Abbeel. Generalized Advantage Estimation for Policy Gradients
  • Aurélia Léon and Ludovic Denoyer. Reinforced Decision Trees
  • Aristide Tossou and Christos Dimitrakakis. Differentially private multi-agent multi-armed bandits
  • Matteo Pirotta and Marcello Restelli. On the Minimization of the Policy Gradient in Inverse Reinforcement Learning
  • Francesco Trovo, Marcello Restelli, Stefano Paladino and Nicola Gatti. Multi-Armed Bandit for Pricing
  • William Curran, Tim Brys, Matthew Taylor and William Smart. Using PCA to Efficiently Represent State Spaces
  • Abhinav Garlapati, Aditi Raghunathan, Vaishnavh Nagarajan and Balaraman Ravindran. A Reinforcement Learning Approach to Online Learning of Decision Trees
  • Yahel David and Nahum Shimkin. PAC Algorithms for the Infinitely-Many Armed Problem with Multiple Pools
  • Ashique Rupam Mahmood, Huizhen Yu, Martha White and Richard Sutton. Emphatic Temporal-Difference Learning
  • Matteus Tanha, Tse-Han Huang, Geoffrey J. Gordon and David J. Yaron. Imitation Learning for Accelerating Iterative Computation of Fixed Points
  • Philip Bachman and Doina Precup. Learning Policies for Data Imputation with Guided Policy Search
  • Gergely Neu. Explore no more: Simple and tight high-probability bounds for non-stochastic bandits
  • Nathaniel Korda and L.A. Prashanth. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

Paper Submission

We are calling for papers (and posters) from the entire reinforcement learning spectrum, with the option of either 2 page position papers (on which open discussion will be held) or longer papers up to 8 pages (plus one page with references) in JMLR-format research papers [link]. We encourage a range of submissions to encourage broad discussion.

A selection of accepted papers will appear in the prestigious JMRL W&C proceedings.

Double submissions are allowed (e.g., with ICML). However in the event that an EWRL paper is accepted to another conference proceedings or journal, copyright restrictions prevent it from being reprinted in the JMLR W&C proceedings. The paper would still be considered, however, for acceptance and presentation at EWRL.

  • Submission deadline: 01-May-2015 — EXTENSION: 03-MAY, 23:59 Universal Time
  • Page limit: 2 pages for position papers and 8 pages plus one page with references for regular papers.
  • Paper format: JMLR format
  • Submission websitehttps://easychair.org/conferences/?conf=ewrl122015
  • The review process is double-blind

Important Dates

  • Paper submissions due: 01-May-2015
  • Notification of acceptance: 10-May-2015
  • Workshop dates: 10/11-July-2015

Organizing Committee


Sponsors

inr_logo_cherch_FR_coul

logo-cristal-600x200


Program Committee

  • Yasin Abbasi-Yadkori
  • Peter Auer
  • Andre Barreto
  • Marc Bellemare
  • Emma Brunskill
  • Christian Daniel
  • Marc Deisenroth
  • Christos Dimitrakakis
  • Amir-Massoud Farahmand
  • Victor Gabillon
  • Matthieu Geist
  • Alborz Geramifard
  • Mohammad Ghavamzadeh
  • Mohammad Gheshlaghi-Azar
  • Matthew Hoffman
  • Alessandro Lazaric
  • Rupam Mahmood
  • Odalric-Ambrym Maillard
  • Timothy Mann
  • Jeremie Mary
  • Remi Munos
  • Gergely Neu
  • Gerhard Neumann
  • Ann Nowe
  • Laurent Orseau
  • Ronald Ortner
  • Simone Parisi
  • Olivier Pietquin
  • Bilal Piot
  • Doina Precup
  • Marcello Restelli
  • Scott Sanner
  • Peter Sunehag
  • Csaba Szepesvari
  • Georgios Theocharous
  • Michal Valko
  • Martijn VanOtterlo
  • Nikos Vlassis

Registration

Since EWRL-2015 will be organized as a ICML workshop, the ICML-workshop fees have to be paid. EWRL will not have any additional fees.


Workshop Venue


Photos

Advertisements

%d bloggers like this: