Before joining DeepMind I visited the machine learning group at the University of Amsterdam, where I worked with Max Welling and Jakub Tomczak.

I am interested in Bayesian statistics, machine learning and deep learning. In particular I am working on large scale Bayesian machine learning. Most of my research so far has been on distributed Bayesian learning using stochastic natural gradient expectation propagation. I am also interested in stochastic gradient MCMC methods and variational inference. Occasionally I blog about my work.

Alexandre Galashov,
Siddhant Jayakumar,
Leonard Hasenclever,
Dhruva Tirumala,
Jonathan Schwarz,
Guillaume Desjardins,
Wojtek M. Czarnecki,
Yee Whye Teh,
Razvan Pascanu,
Nicolas Heess,
Information asymmetry in KL-regularized RL, in International Conference on Learning Representations, 2019.

@inproceedings{galashov2018information,
title = {Information asymmetry in {KL}-regularized {RL}},
author = {Galashov, Alexandre and Jayakumar, Siddhant and Hasenclever, Leonard and Tirumala, Dhruva and Schwarz, Jonathan and Desjardins, Guillaume and Czarnecki, Wojtek M. and Teh, Yee Whye and Pascanu, Razvan and Heess, Nicolas},
booktitle = {International Conference on Learning Representations},
year = {2019}
}

Josh Merel*,
Leonard Hasenclever*,
Alexandre Galashov,
Arun Ahuja,
Vu Pham,
Greg Wayne,
Yee Whye Teh,
Nicolas Heess,
Neural Probabilistic Motor Primitives for Humanoid Control, in International Conference on Learning Representations, 2019.

@inproceedings{merel2018neural,
title = {Neural Probabilistic Motor Primitives for Humanoid Control},
author = {Merel*, Josh and Hasenclever*, Leonard and Galashov, Alexandre and Ahuja, Arun and Pham, Vu and Wayne, Greg and Teh, Yee Whye and Heess, Nicolas},
booktitle = {International Conference on Learning Representations},
year = {2019}
}

2018

Rianne van den Berg*,
Leonard Hasenclever*,
Jakub M. Tomczak,
Max Welling,
Sylvester Normalizing Flows for Variational Inference, in UAI, 2018.

Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.

@inproceedings{BergHasenclever2018,
archiveprefix = {arXiv},
arxivid = {1803.05649},
author = {{van den Berg}*, Rianne and Hasenclever*, Leonard and Tomczak, Jakub M. and Welling, Max},
eprint = {1803.05649},
file = {:Users/leonard/Library/Application Support/Mendeley Desktop/Downloaded/Berg et al. - 2018 - Sylvester Normalizing Flows for Variational Inference.pdf:pdf},
month = aug,
booktitle = {UAI},
title = {{Sylvester Normalizing Flows for Variational Inference}},
year = {2018}
}

Wojciech Marian Czarnecki,
Siddhant M. Jayakumar,
Max Jaderberg,
Leonard Hasenclever,
Yee Whye Teh,
Simon Osindero,
Nicolas Heess,
Razvan Pascanu,
Mix&Match - Agent Curricula for Reinforcement Learning, in ICML, 2018.

We introduce Mix&Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents. In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally. We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods. (2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.

@inproceedings{Czarnecki2018,
archiveprefix = {arXiv},
arxivid = {1806.01780},
author = {Czarnecki, Wojciech Marian and Jayakumar, Siddhant M. and Jaderberg, Max and Hasenclever, Leonard and Teh, Yee Whye and Osindero, Simon and Heess, Nicolas and Pascanu, Razvan},
eprint = {1806.01780},
file = {:Users/leonard/Library/Application Support/Mendeley Desktop/Downloaded/Czarnecki et al. - 2018 - Mix{\&}ampMatch - Agent Curricula for Reinforcement Learning.pdf:pdf},
month = jul,
booktitle = {ICML},
title = {{Mix{\&}Match - Agent Curricula for Reinforcement Learning}},
year = {2018}
}

2017

T. Nagapetyan,
A. B. Duncan,
L. Hasenclever,
S. J. Vollmer,
L. Szpruch,
K. Zygalakis,
The True Cost of Stochastic Gradient Langevin Dynamics, Jun-2017.

The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.

@unpublished{Nagapetyan2017,
month = jun,
author = {Nagapetyan, T. and Duncan, A. B. and Hasenclever, L. and Vollmer, S. J. and Szpruch, L. and Zygalakis, K.},
eprint = {1706.02692},
title = {{The True Cost of Stochastic Gradient Langevin Dynamics}},
year = {2017}
}

X Lu,
V Perrone,
L Hasenclever,
Y W Teh,
S J Vollmer,
Relativistic Monte Carlo, in AISTATS, 2017.

Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduce a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.

@inproceedings{LuPerHas2016a,
annote = {ArXiv e-prints: 1609.04388},
author = {Lu, X and Perrone, V and Hasenclever, L and Teh, Y W and Vollmer, S J},
booktitle = {AISTATS},
title = {{Relativistic Monte Carlo}},
year = {2017}
}

Leonard Hasenclever,
Stefan Webb,
Thibaut Lienart,
Sebastian Vollmer,
Balaji Lakshminarayanan,
Charles Blundell,
Yee Whye Teh,
Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server, Journal of Machine Learning Research, vol. 18, no. 106, pp. 1–37, 2017.

This paper makes two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in that it does not require any simplifying assumptions on the distribution of interest, beyond the existence of some Monte Carlo sampler for estimating the moments of the EP tilted distributions. Further, as opposed to EP which has no guarantee of convergence, SNEP can be shown to be convergent, even when using Monte Carlo moment estimates. Secondly, we propose a novel architecture for distributed Bayesian learning which we call the posterior server. The posterior server allows scalable and robust Bayesian learning in cases where a dataset is stored in a distributed manner across a cluster, with each compute node containing a disjoint subset of data. An independent Monte Carlo sampler is run on each compute node, with direct access only to the local data subset, but which targets an approximation to the global posterior distribution given all data across the whole cluster. This is achieved by using a distributed asynchronous implementation of SNEP to pass messages across the cluster. We demonstrate SNEP and the posterior server on distributed Bayesian learning of logistic regression and neural networks.

@article{HasWebLie2015a,
author = {Hasenclever, Leonard and Webb, Stefan and Lienart, Thibaut and Vollmer, Sebastian and Lakshminarayanan, Balaji and Blundell, Charles and Teh, Yee Whye},
title = {Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server},
journal = {Journal of Machine Learning Research},
year = {2017},
volume = {18},
number = {106},
pages = {1-37}
}

2014

W. Hordijk,
L. Hasenclever,
J. Gao,
D. Mincheva,
J. Hein,
An investigation into irreducible autocatalytic sets and power law distributed catalysis, Natural Computing, vol. 13, no. 3, pp. 287–296, 2014.

RAF theory has been established as a useful and formal framework for studying the emergence and evolution of autocatalytic sets. Here, we present several new and additional results on RAF sets. In particular, we investigate in more detail the existence, expected sizes, and composition of the smallest possible, or irreducible, RAF sets. Furthermore, we study a more realistic variant of the well-known binary polymer model in which the catalysis events are assigned according to a power law distribution. Together, these results provide further insights into the existence and structure of autocatalytic sets in simple models of chemical reaction systems, with possible implications for theories on the origin of life.

@article{Hordijk2014,
author = {Hordijk, W. and Hasenclever, L. and Gao, J. and Mincheva, D. and Hein, J.},
doi = {10.1007/s11047-014-9429-6},
issn = {1572-9796},
journal = {Natural Computing},
number = {3},
pages = {287--296},
title = {{An investigation into irreducible autocatalytic sets and power law distributed catalysis}},
volume = {13},
year = {2014}
}

2013

S. S. Pegler,
K. N. Kowal,
L. Hasenclever,
M. G. Worster,
Lateral controls on grounding-line dynamics, Journal of Fluid Mechanics, vol. 722, no. 5929, p. R1, May 2013.

We present a theoretical and experimental study of viscous gravity currents introduced at the surface of a denser inviscid fluid layer of finite depth inside a vertical Hele-Shaw cell. Initially, the viscous fluid floats on the inviscid fluid, forming a self-similar, buoyancy-driven current resisted predominantly by the viscous stresses due to shear across the width of the cell. Once the viscous current contacts the base of the cell, the flow can be considered in two regions: a grounded region in which the current lies in full contact with the base; and a floating region. The subsequent advance of the grounding line separating these regions is shown to be controlled by the thickening of the current associated with balancing the local shear stresses. An understanding of the flow transitions is developed using asymptotic and numerical analysis of a model based on lubrication theory.

@article{Pegler2013,
author = {Pegler, S. S. and Kowal, K. N. and Hasenclever, L. and Worster, M. G.},
doi = {10.1017/jfm.2013.140},
file = {:home/leonard/.local/share/data/Mendeley Ltd./Mendeley Desktop/Downloaded/Pegler et al. - 2013 - Lateral controls on grounding-line dynamics.pdf:pdf},
issn = {0022-1120},
journal = {Journal of Fluid Mechanics},
keywords = {Hele-Shaw flows,geophysical and geological flows,ice sheets},
month = may,
number = {5929},
pages = {R1},
publisher = {Cambridge University Press},
title = {{Lateral controls on grounding-line dynamics}},
volume = {722},
year = {2013}
}