Satellite events 12-14 March Bayes Comp 15-17 March Levi, Finland

Keynotes

Tamara Broderick

Massachusetts Institute of Technology, USA

An Automatic Finite-Sample Robustness Check for Bayes and Beyond: Can Dropping a Little Data Change Conclusions?

Commonly researchers will run a statistical analysis on a data sample, with the goal of applying any conclusions to a new population. For instance, if economists conclude microcredit is effective at alleviating poverty based on observed data, policymakers might decide to distribute microcredit in other locations or future years. Typically, the original data is not a perfect random sample of the population where policy is applied -- but researchers might feel comfortable generalizing anyway so long as deviations from random sampling are small, and the corresponding impact on conclusions is small as well. Conversely, researchers might worry if a very small proportion of the data sample was instrumental to the original conclusion. So we propose a method to assess the sensitivity of statistical conclusions to the removal of a very small fraction of the data set. Manually checking all small data subsets is computationally infeasible, so we propose an approximation based on the classical influence function. Our method is automatically computable for MAP, variational Bayes, MLE, and other common estimators -- and we discuss extensions to MCMC. We provide finite-sample error bounds on approximation performance and a low-cost exact lower bound on sensitivity. We find that sensitivity is driven by a signal-to-noise ratio in the inference problem, does not disappear asymptotically, is not decided by misspecification, and is not eliminated by taking a Bayesian approach. While some empirical applications are robust, conclusions of several influential economics papers can be changed by removing (much) less than 1% of the data.

Anthony Lee

University of Bristol, United Kingdom

https://sites.google.com/view/anthonylee

How many steps are needed for random walk Metropolis? Explicit convergence bounds for Metropolis Markov chains

One of the simplest and enduringly popular general-purpose Monte Carlo Markov chains evolving on R^d is the random walk Metropolis (RWM) Markov chain. Despite its relative simplicity, explicit convergence bounds that scale suitably with dimension have proved elusive for decades. In recent years, progress has been made to show that for distributions with strongly convex and gradient-Lipschitz potentials there exists a specific proposal variance giving an explicit bound on the L^2-mixing time. We refine these results and obtain explicit spectral gap and L^2-mixing time bounds for RWM with arbitrary proposal variances in any dimension, demonstrating the robustness of the algorithm. We obtain the correct scaling with dimension of the spectral gap for sufficiently regular target distributions, and the mixing time bounds are of reasonable (not astronomical) order. Our positive results are quite generally applicable in principle. Essentially the same analysis can be performed for the preconditioned Crank--Nicolson Markov chain, obtaining dimension-independent bounds under suitable assumptions. This is joint work with C. Andrieu, S. Power and A. Wang.

Veronika Rockova

University of Chicago Booth School of Business, USA

http://veronikarock.com

Adversarial Bayesian Simulation

In the absence of explicit or tractable likelihoods, Bayesians often resort to approximate Bayesian computation (ABC) for inference. Going beyond ABC, my talk surveys recent optimization approaches to simulation-based Bayesian inference in likelihood-free situations. In particular, I will focus on deep neural samplers based on generative adversarial networks (GANs) and on adversarial variational Bayes. Both ABC and GANs compare aspects of observed and fake data to simulate from posteriors and likelihoods, respectively. I will discuss the Bayesian GAN (B-GAN) sampler that directly targets the posterior by solving an adversarial optimization problem. B-GAN is driven by a deterministic mapping learned on the ABC reference byconditional GANs. Once the mapping has been trained, iid posterior samples are obtained by filtering noise at a negligible additional cost. My talk also mentions more traditional posterior sampling approaches (ABC and Metropolis-Hastings) based on classification.

Invited sessions

Parallel I invited sessions

State-space modelling and particle filtering (Chair: Nicolas Chopin): Adrien Corenflos, Hai-Dang Dau, Axel Finke

This session will present an overview of recent advances in state-space models (also known as partially observed Markov processes) and the computational tools that may be used to perform inference with respect to such models, namely particle filters (also known as Sequential Monte Carlo algorithms).

Stein Discrepancies (Chair: Chris Oates): Matthew Fisher, Heishiro Kanagawa, Marina Riabiz

This session focusses on the construction and application of Stein discrepancies, which provide a non-parametric and optimisation-centric perspective on Bayesian computation, enabling powerful optimisation techniques to be employed. In particular, the session will cover the recent development of Stein discrepancies that are gradient-free and applicable to latent variable models, as well as the scalable and efficient use of Stein discrepancies as a post-processing tool for MCMC.

Measuring Quality of MCMC Samples (Chair: Dootika Vats): Hyebin Song, Medha Agarwal, James Flegal

The pursuit to achieve more precise and robust statistical analyses has led researchers to consider more complex models to the point that doubly intractable posteriors have become a common statistical problem. The term doubly intractable refers to the cases in which the proportional part of the posterior density cannot be computed analytically so that standard methods, such as MCMC, cannot be used in their traditional forms. This problem typically arises when the posterior distribution is infinite-dimensional, for example, as in continuous space/time models. Recent advances in the field of stochastic simulation and Monte Carlo theory have allowed the development of exact methodologies, in the sense of involving only Monte Carlo error and not relying on finite-dimensional or numerical approximations. This session discusses possible solutions for some doubly intractable posterior problems.

What lies beneath - Some recent advances in Bayesian nonparametrics (Chair: Jim Griffin): Jeff Miller, Botond Szabo, Raffaele Argiento

Bayesian nonparametric methods can be challenging to fit to real-life data sets. This session will cover advances in fitting hierarchical mixture mixtures which involve large numbers of latent variables, the statistical properties of commonly used sparse variational approximations for Gaussian process regression models and the calibration of generalized Bayes posteriors (where the usual likelihood is replaced by a loss function) under model misspecification.

Parallel II invited sessions

Advances in twisted models for sequential Monte Carlo (Chair: Anthony Lee): Adam Johansen, Nicolas Chopin, Joshua Bon

This session will bring together recent advances in sequential Monte Carlo (SMC) where the model underlying the algorithm undergoes a change of measure (i.e. the model is twisted). Exploring such methods is a promising direction for SMC as the optimal twisting functions define an exact sampling algorithm and perfect estimates of the normalising constant. Despite the appeal, research in this area is quite new and limited to learning the twisting functions offline and for a limited class of models. The session will feature talks from Adam Johansen (Warwick), Nicolas Chopin and Joshua Bon (QUT), and will be chaired by Anthony Lee (Bristol). The talks will feature methods for learning twisting functions online, twisting methods that are applicable to new classes of models, and new applications for twisted-model SMC. This burgeoning area will benefit greatly from bringing interested researchers together to discuss current work and future directions.

Robust innovations in gradient-based MCMC (Chair: Samuel Livingstone): Chris Sherlock, Mauro Camara Escudero, Lionel Riou-Durand

Metropolis-Hastings algorithms with proposals informed by the local gradient have long been known to perform substantially better in high dimensions than simpler alternatives such as the random-walk Metropolis and independence sampler. In particular, Hamiltonian Monte Carlo (HMC) and the Metropolis-Adjusted Langevin Algorithm are very popular tools for Bayesian inference on, respectively, high- and moderately high-dimensional targets. The efficiency improvements, however, come at a price. The performance of both algorithms can degrade rapidly, sometimes terminally, in the presence of large gradients; furthermore, even when the gradients are controlled, the efficiency of HMC is notoriously sensitive to the choice of tuning parameters. This session showcases new gradient-based Markov chain Monte Carlo algorithms, as well as analyses and further developments of recent innovations, that tackle these sensitivities and enable straightforward, robust, gradient-based inference.

Bayesian computation to track the pandemic (Chair: Theodore Kypraios): Christopher Jewell, Alfonso Diz-Lois Palomares, Daniela De Angelis

People of all backgrounds have now become familiar with evidence-based decisions as data on Covid-19 cases were exploited to guide policy in many countries in the last 30 months. Statisticians behind these decisions have been fighting against time, not only to update models that would account for the ever-changing situation around the pandemic, but also to push the boundaries of traditional inferential methods to meet computational-budget constraints that were imposed by real-time decisions. Bayesian evidence synthesis offers the perfect framework to integrate information from multiple sources and prior beliefs, but traditional computation methods such as MCMC, SMC and ABC are often too expensive in their standard form to allow real time inference. This session presents three innovative ways in which bespoke methods were created to make inference more rapid and preserve model realism. The motivating example is always the inference of some aspects of the Covid-19 pandemic. The three speakers are world leaders in terms of research and their methods are routinely used by governing bodies in the fight against Covid-19.

Bayesian statistics for environmental data (Chair: Mari Myllymäki): Jarno Vanhatalo, Jeffrey W. Doser, Janine Illian

The session puts together three talks on application of Bayesian statistics for environmental data. Such data are often spatial, temporal or spatio-temporal, and heterogeneous across their domain. The data can further be multivariate. Thus, the analysis requires careful consideration, taking care of possible complex interactions and non-stationarities as well. The session discusses analysis of such data, including e.g. integration of information from various sources, probabilistic predictions, or needs in uncertainty quantification and model evaluation.

Parallel III invited sessions

Optimisation meets sampling (Chair: Chris Nemeth): Louis Sharrock, Nikolas Nusken, Adil Salim

A core theme within Bayesian computation is the use of sampling techniques, such as Markov chain Monte Carlo algorithms, to approximate intractable posterior distributions. Such methods are generally considered the gold standard for approximating posterior distributions due to their supporting theoretical results. However, a drawback of these approaches is that they can be slow to converge, particularly when the posterior parameter space is large, or the posterior is fit against a large dataset. In machine learning, significant progress has been made to develop fast optimisation methods, such as stochastic gradient descent, however, applying these to posterior distributions gives only the maximum apostetiori value and not an approximation to the full posterior distribution. Recent work from optimal transport provides a new perspective on sampling, where sampling can be viewed as an optimisation problem over the space of probability distributions by considering Wasserstein gradient flow.

Robustness to model misspecification (Chair: Jeff Miller): Jonathan Huggins, Catherine Xue, Ryan Giordano

When models are incorrect, it is well established that the resulting inferences can be unreliable. Thus, developing computationally tractable methods for robust Bayesian inference under misspecification is important in many applications. This session focuses on new and improved computational methods for analyzing and mitigating the effects of misspecification.

Bayesian regression on networks (Chair: Sameer Deshpande): Anna Menacher, Alexander Nikitin, Sameer Deshpande

Networks feature prominently in many application areas from the social sciences to spatial epidemiology to neuroimaging. Typically, the vertices of a network correspond to units (e.g. people, geographic areas, or voxels) and edges encode interactions, similarities, or other relationships between units (e.g. friendship, spatial adjacency) in a complex system. Increasingly, data are collected at each vertex of a network and incorporating network structure while modeling such data introduces several computational and methodological challenges. This session will focus on new Bayesian regression models of data observed on large networks with particular emphasis on the computational challenges involved. Specifically it will highlight recent work that (i) incorporates network structure in sparse regression; (ii) constructs new classes of network-based kernels that can be used in Gaussian process modeling; and (iii) develops new regression tree priors that incorporate network structure in Bayesian tree ensembles.

Parallel IV invited sessions

Scalable Monte Carlo (Chair: Paul Fearnhead): Giacomo Zanella, Francesca Crucinio, Nikola Surjanovic

Monte Carlo methods underpin most Bayesian methods. Scaling these to be able to fit increasingly complex models, deal with high-dimensional parameter spaces or larger number of data points, remains an open challenge. This session will consist of three talks from excellent early-career researchers who are are at the forefront of recent advances in developing Bayesian Monte Carlo approaches that are scalable: including divide-and-conquer approaches for sequential Monte Carlo, improving parallel tempering by merging MCMC and variational methods, robust gradient-based MCMC and scalable computation for hierarchical models.

Computational challenges in modeling complex data (Chair: Raffaele Argiento): Amy Herring, Sirio Legramanti, Gregor Kastner

Complex data arise in many modern applied contexts, including genetics, epidemiology, environmental and social sciences. The analysis of this data poses new challenges and requires the development of novel statistical models and computational methods, fueling many fascinating research areas of Bayesian statistics. The session discusses computational advances on Bayesian inference for flexible modeling and probabilistic clustering for complex data structures. In particular, the session covers recent developments in networks analysis, time series analysis and functional data analysis.

Evidence synthesis: conflicts, splits and cuts (Chair: Anne Presanis): Noa Kallioinen, Yu Xuejun, Robert Goudie

Bayesian evidence synthesis models are complex probabilistic models that combine multiple, disparate data sources, often in a hierarchical model, to estimate quantities that are challenging to directly observe. Examples include estimating the severe burden of infectious disease; population models in ecology; and pharmacokinetic/pharmacodynamic models. Often such complex models are easier to fit, both from a computation and model assessment point of view, if they are split into multiple “modules”, each informed by different subsets of the data sources, with parameters in common between the modules. Any such synthesis requires sensitivity analyses; assessment of the consistency of the evidence between modules and data sources; resolution of any detected conflicts, either by bias modelling, exclusion of sources judged too poor to include, or downweighting or cutting of the influence of data sources providing potentially biased information; and once these have been resolved, computationally efficient methods for combining the modules. This session will explore computational methods for each of these stages of model development and assessment, and the links between them, including: methods for sensitivity analysis and conflict diagnostics; algorithms for implementing cut posteriors; and Markov melding for splitting and joining modules.

Trust and adding new algorithms to probabilistic programming frameworks (Chair: Aki Vehtari): Måns Magnusson, Lu Zhang, Mitzi Morris

There is all the time new computational algorithms proposed for making inference faster (also in BayesComp). Many of the proposed and published methods do not end up in widely used inference frameworks. This session, presents three talks from the point of view of Stan development team, discussing what is required to gain sufficient trust for a new algorithm that it could be considered to be added to a probabilistic programming framework that is generally trusted for production use. When adding new algorithms, the development teams need to consider additional maintenance burden, the focus of the framework, and the perceived trust of the whole framework given the trust in the pieces of the framework. Not only does the new algorithm need to have a sufficient speed advantage but needs to be robust and easy to use. The talks in the session present tools for assessing new algorithms, a case example, and an overall view of what needs to be taken into account to get a new algorithm to probabilistic frameworks that are commonly used in production systems.

Parallel V invited sessions

Piecewise deterministic Monte Carlo: Recent Advances (Chair: Gareth Roberts): Gareth Roberts, Sebastiano Grazzi, Paul Fearnhead

This is part of a coordinated double session which covers recent advances in the theory, methodology and applications of Monte Carlo methods based on piecewise deterministic Markov processes, hereinafter referred to simply as PDMP samplers. PDMP samplers are rejection-free, continuous-time processes that are non-reversible by construction. It is well known that non-reversibility can significantly improve the performance of sampling methods, both in terms of convergence to stationarity and asymptotic variance. In this session, we will demonstrate how PDMPs proved to be pivotal for making progress in some important and technically challenging areas of statistical inference. With a focus on medical statistics, we tell success stories where new methods based on PDMPs are used for example - to sample the high-dimensional latent constrained space of infection times of individuals in the SIR model with notifications and to estimate relevant parameters in such model; - to sample efficiently high-dimensional targets arising in medical imaging problems, with a novel approach which involves Neural generative priors; - to sample target measures for statistical problems with intractable likelihoods. As we give particular attention to applications in branches of medical sciences, we would like this session to be considered also for the satellite event.

MCMC for Multi-Modal Distributions (Chair: Raiha Browning): Saifuddin Syed, Matt Moores, Krzysztof Latuszynski

When the target distribution has multiple local maxima, standard MCMC algorithms can exhibit very poor performance, such as torpid mixing. These problems worsen as either the dimension of the parameter space or the separation between the modes increases. This session will feature three talks on recent advancements in sampling algorithms for multi-modal distributions. Saifuddin Syed will discuss non-reversible variant of parallel tempering (PT), which eliminates the diffusivity endemic to traditional PT schemes. This non-reversibility can be leveraged to develop a black box algorithm to optimally tune PT that can scale to GPUs. Emilia Pompe will introduce the Jumping Adaptive Multimodal Sampler (JAMS), an auxiliary variable adaptive MCMC algorithm. JAMS combines two proposal kernels: local moves and jumps to regions associated with different modes. A burn-in routine is used to find the mode locations, as well as to estimate the covariance matrices associated with each. Matt Moores will describe the Annealed Leap-Point Sampler (ALPS), an algorithm that runs multiple Markov chains at different temperatures, similar to PT. The exploration component searches for modes at a “hot” temperature (beta < 1), while the other chains operate at a sequence of “cold” temperatures (beta >= 1), jumping between modes at the coldest temperature.

New tools for high-dimensional Bayesian inference from physics and ML (Chair: Antonietta Mira): Aldo Glielmo, Simone Ulzega, Dhiman Ray

Physics has always been a great source for powerful inference algorithms. This is particularly true when it comes to inference with a very large number of variables, where Hamiltonian Monte Carlo algorithms have become indispensable. We believe that Bayesian inference could continue to profit from developments in physics. In particular, recent developments based on ML-generated collective variables could be brought to fruition beyond the realm of chemical physics. On the other hand, physicists simulating systems with many degrees of freedom might also profit from recent developments in Bayesian inference in the statistics and ML communities. With this session, we want to facilitate the cross-fertilization between these different communities.

Likelihood-free inference with kernel distances (Chair: François-Xavier Briol): Charita Dellaporta, Lorenzo Pacchiardi, Ayush Bharti

Abstract: In recent years, a panoply of generalisations or approximations of Bayesian inference have been proposed based on distances between probability distributions. Amongst these, kernel-based distances such as the maximum mean discrepancy have been shown to have strong computational advantages, as well as inducing robustness into the corresponding posteriors. This session will cover a range of Bayesian approaches for likelihood-free inference, including approximate Bayesian computation, generalised Bayesian computation and the posterior bootstrap, which use the maximum mean discrepancy to obtain the aforementioned advantages.

Parallel VI invited sessions

Advances in theory and methodology of MCMC (Chair: Matti Vihola): Christophe Andrieu. Błażej Miasojedow, Jimmy Olsson

This session is about recent foundational theoretical developments in MCMC theory and methodology. All the speakers of the session are top researchers in the field.

Piecewise deterministic Monte Carlo (Chair: Sebastiano Grazzi): Ardjen Pengel, Marc Corstanje, Joris Bierkens

PDMC are continuous-time processes that are non-reversible by construction. It has been shown that non-reversibility improves, in some cases, the performance of sampling methods, both in terms of convergence to stationarity and asymptotic variance. In this session, we present new theoretical results such as
- strong approximation results for PDMCs which attain the optimal convergence rate for Berry-Esseen type inequalities for the Functional Central Limit Theorem. These results provide essential building blocks for the analysis of Monte Carlo variance estimation;
- extensions of PDMPs on Riemannian manifolds and scaling limits which explore the behaviour of PDMC for high-dimensional target distributions that exhibit strong correlations and multi-modalities. These results provide insight in many target distributions arising in modern applications.

Approximate Bayesian Computation (Chair: Christian Robert): Umberto Picchini, Giorgos Vasdekis, Jeremias Knoblauch

Normalising Flows to Enhance Bayesian Sampling (Chair: Joshua Bon): Marylou Gabrie, Michael Arbel, Laurence Davies

Sampling algorithm performance is problem dependent, often improved with knowledge of the target distribution. Normalising Flows (NFs) are a recently popularised mechanism that permits sampling from an approximation to a continuous target density, as well as providing an approximate transport for samples from a target distribution to a known reference distribution. In recent years there has a great deal of interest in exploiting NFs to enhance Bayesian sampling for complex posteriors.

Parallel VII invited sessions

Machine Learning meets Adaptive MCMC (Chair: Maxim Panov): Eric Vanden Eijden, Eric Moulines, Achille Thin

Markov Chain Monte Carlo (MCMC) methods are a powerful tool for computation with complex probability distributions. However the performance of such methods is critically dependent on properly tuned parameters, most of which are difficult if not impossible to know a priori for a given target distribution. Classical approaches to adaptive MCMC are quite limited and do not go beyond tuning the scale or covariance of proposal distributions. Recent advances in machine learning allow for efficient parameterization of much more complex diffeomorphisms called "normalising flows." The use of such flows in adaptive MCMC approaches yields very encouraging results and allows difficult problems to be considered. The purpose of this session is to review this progress by inviting key speakers (Eric Moulines, Marylou Gabrié, Eric Vanden Eijden, James Brofos, Serguei Samsonov, etc..) who have recently contributed in this direction.

J-ISBA advances in scalable Bayesian methods (Chair: Ale Avalos): François-Xavier Briol, Karla Monterrubio-Gómez, Andrea BertazziMarkov Chain Monte Carlo (MCMC) methods are the gold standard in Bayesian inference. MCMC algorithms provide a general way to sample from the posterior distribution and have been widely used due their asymptotic properties. However, the computational complexity of expensive scientific models and/or the high-dimensionality and heterogeneity of modern datasets limit the applicability of MCMC methods. This session aims at providing various approaches that are alternative to classical MCMC algorithms, with a specific focus on novel methods developed by bright early career researchers.

Statistical Computing for Phylogenetics (Chair: Luiz Max Carvalho): Jason Xu, Luke Kelly, Marc Suchard

Phylogenetics is at the heart of disciplines as diverse as Marine Biology and Linguistics. Mathematically, phylogenies are planar graphs with an associated orthant structure on the edge lengths, and are extremely challenging objects to estimate from data. Moreover, molecular sequence data sets have grown in size and complexity by orders of magnitude in recent years, particularly so during the COVID-19 pandemic (Cappello et al, 2022). Modern phylogenetic applications involve the use of linear programming techniques for the calculation of the likelihood in linear time (Ji et al., 2020) and the clever use of GPU programming to exploit highly parallel computer architectures for efficient computation (Ayres et al., 2019). Pressing issues in Statistical Phylogenetics range from the construction of efficient Markov chain Monte Carlo schemes to sample from posterior distributions on the space of trees to the development of powerful diagnostic tools (Kelly, Ryder & Clarté, 2021) for these algorithms to devising efficient bootstrap and optimisation techniques for frequentist and Bayesian analyses.

Recent Advances in Variational Inference (Chair: Anna Menacher): Debdeep Pati, Yingzhen Li, Trevor Campbell
Variation inference (VI) for Bayesian computation has grown tremendously in the past decades. Compared to the Markov chain Monte Carlo (MCMC) methods, VI has advantages of computational efficiency and scalability for analysis of the large scale data, while being relatively accurate for approximating the true posterior distribution. VI has many successful applications in machine learning and biomedical sciences, it is also of great interest to study the theoretical properties of VI. This invited session will focus on recent developments of VI in terms of both methodology and theory.

Contributed talks

Abstracts in alphabetical order

Ale Avalos, Bayesian Inference of Multiple Ising Models for Heterogeneous Data

Multiple Ising models can be used to model the heterogeneity induced in a set of binary variables by external factors. These factors may influence the joint dependence relationships represented by a set of graphs across different groups. This talk presents the inference for this class of models and proposes a Bayesian methodology based on a Markov Random Field prior for the multiple graph setting. Such prior enables the borrowing of strength across the different groups to encourage common edges when supported by the data. Sparse inducing spike-and-slab priors are employed on the parameters that measure graph similarities to learn which subgroups have a shared graph structure. Two Bayesian approaches are developed for the inference of multiple Ising models with special focus on model selection: (i) a Fully Bayesian method for low-dimensional graphs based on conjugate priors specified with respect to the exact likelihood, and (ii) an Approximate Bayesian method based on a quasi-likelihood approach for high-dimensional graphs where the normalization constant required in the exact method is computationally intractable. The performance of the proposed methods are studied and compared with competing approaches through an extensive simulation study. Both inferential strategies are employed for the analysis of data resulting from two public opinion studies in US. The first one analyzes the confidence in political institutions in different groups divided by the time users spent on web pages. The second one studies the opinion on public spending in diverse inter-generational groups.

Filippo Ascolani, Complexity of Gibbs Samplers through Bayesian asymptotics

Gibbs samplers are popular algorithms to approximate posterior distributions arising from Bayesian models. Despite their popularity and good empirical performances, however, there are still relatively few quantitative theoretical results on their scalability or lack thereof, e.g. much less than for gradient-based sampling methods. We introduce a novel technique to analyse the asymptotic behaviour of mixing times of Gibbs Samplers, based on tools of Bayesian asymptotics. Our methodology applies to high-dimensional regimes where both number of datapoints and parameters increase, under random data-generating assumptions. This allows us to provide a fairly general framework to study the complexity of Gibbs samplers fitting complex hierarchical Bayesian models. The methodology is applied to two-level hierarchical models with likelihoods belonging to a general class (e.g. Binomial or Normal with unknown variances) and exponential family priors. In this framework we are able to provide dimension-free convergence results for Gibbs Samplers under mild conditions. Moreover, we provide appropriate bounds on the rate of convergence using spectral theory.

Alexandros Beskos, Manifold Markov chain Monte Carlo methods for Bayesian inference in diffusion models

Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times.

Raiha Browning, Flexible estimation of the temporal excitation pattern of discrete-time self-exciting processes

Hawkes processes are a self-exciting stochastic process, whereby past events increase the probability of future events occurring. A key feature of these processes is the conditional intensity function, λ(t|H(t−1)), where H(t−1) is the history of the process up to time t − 1. λ(t|H(t−1)) is comprised of two components: a baseline rate, representing independent events, and a self-exciting term that describes the self-excitation. Most standard models of Hawkes processes rely on a parametric form for the self-exciting term of the intensity function, referred to as the triggering kernel, which describes the influence of past events. This is likely to be insufficient to capture the true excitation pattern, particularly for complex data. In this work we present a trans-dimensional Bayesian approach to modelling the triggering kernel for discrete-time Hawkes processes, such that it takes the form of any step function since the location and heights of each step are unknown. This allows for significantly more flexibility than a parametric form. Our method is applied to a study characterising the spread of COVID-19 between France and Italy at the beginning of the pandemic.

Elena Bortolato, Convergence of MCMC algorithms on manifolds through coupling techniques

Some problems in statistics and machine learninig require sampling distributions on submanifolds embedded in R^D. To target such distributions, in the last twenty years constrained Markov Chain Monte Carlo methods have been developed (Brubaker et al. 2012, Zappa et al. 2018, Lelievre et al. 2019). Assessing the convergence of such algorithms still remains an open problem. We propose to apply coupling techniques (Heng and Jacob, 2019, Jacob et al. 2020) that help monitoring the practical convergence of the chains. In particular, we derive couplings of Metropolis-Rosenbluth-Teller-Hastings-type and Hamiltonian Monte Carlo-type algorithms on smooth manifolds and present some applications in the domain of likelihood-free inference. Joint work with Pierre E. Jacob and Robin J. Ryder.

Alberto Cabezas Gonzalez, Transport Elliptical Slice Sampling

We introduce a new framework for efficient sampling from complex probability distributions, using a combination of normalizing flows and elliptical slice sampling (Murray et al., 2010). The core idea is to learn a diffeomorphism, via normalizing flows, that maps the non-Gaussian structure of our target distribution to an approximately Gaussian distribution. We can then sample from our transformed distribution using the elliptical slice sampler, which is an efficient and tuning-free Markov chain Monte Carlo (MCMC) algorithm. The samples are then pulled back using an inverse normalizing flow to yield samples which approximate the stationary target distribution of interest. Our transformed elliptical slice sampler (TESS) is efficiently designed for modern computer architectures, where its adaptation mechanism utilizes parallel cores to rapidly run multiple Markov chains for only a few iterations. Numerical demonstrations show that TESS produce Monte Carlo samples from the target distribution with lower autocorrelation compared to non-transformed samplers. Additionally, assuming a sufficiently flexible diffeomorphism, TESS demonstrates significant improvements in efficiency when compared to gradient-based proposals designed to run on parallel computer architectures.

Victor Elvira, Massively Recycled Importance Sampling
In the context of Bayesian inference, importance sampling (IS) methods are broadly used to approximate posterior distributions and related moments. In its standard approach, samples are simulated from a single-proposal distribution and weighted properly. However, since the IS performance depends on the mismatch between the targeted and the proposal densities, two strategies are often used. First, in multiple importance sampling (MIS), several proposals are employed. Second, in adaptive IS (AIS), the proposals are iteratively adapted in order to improve their performance. In both MIS and AIS, many different weighting schemes are possible and, as a consequence, for the same set of samples, several valid estimators can be built. In this work, we propose to build many different IS estimators and then combine them. This is done by massively reusing the same set of samples and applying different sets of weights. Note that no extra simulations are needed. Moreover, since all the weighting schemes use the same target evaluations, limited extra computations are required. More specifically, only extra proposal evaluations are needed, which are usually cheaper than the target evaluations. We provide algorithms for the optimal linear combination in terms of variance for both MIS and AIS.

Max Hird, Preconditioning for MCMC

Linear transformation of the state variable (linear preconditioning) is a common technique that often drastically improves the practical performance of a Markov chain Monte Carlo algorithm. Despite this, however, quantifying the benefits of linear preconditioning is not well-studied theoretically, and rigorous guidelines for choosing preconditioners are not always readily available. Mixing time bounds for various samplers (HMC, MALA, Unadjusted HMC, Unadjusted Langevin) have been produced in recent works for the class of strongly log-concave and Lipschitz target distributions and depend strongly on a quantity known as the condition number. We study linear preconditioning for this class of distributions, and under appropriate assumptions we provide bounds on the condition number after using a given linear preconditioner. The bounds are easy to interpret and can be used to quantify the mixing properties before and after preconditioning, as well as helping the practitioner choose a good preconditioner to use. We also present counterintuitive examples in which common preconditioning strategies that are used in popular software packages can actually increase the condition number, and therefore lead to a worse-performing algorithm. This is joint work with Samuel Livingstone.

Jack Jewson, Graphical model inference with external network data

A frequent challenge when using graphical models in applications is that the sample size is limited relative to the number of parameters to be learned. Our motivation stems from applications where one has external data, in the form of networks between variables, that provides valuable information to help improve inference. Specifically, we depict the relation between COVID19 and social and geographical network data, and between stock market and economic and policy networks extracted from text data. We propose a graphical LASSO framework where likelihood penalties are guided by the external network data. We also propose a spike-and-slab prior framework that depicts how partial correlations depend on the networks, which helps interpret the fitted graphical model. We develop computational schemes and software implementations in R and probabilistic programming languages. Our applications show how one may significantly improve interpretation, statistical accuracy, and out-of-sample prediction, in some instances using significantly sparser graphical models than would otherwise be necessary.

Miika Kailas, Online mass matrix adaptation for Hamiltonian Monte Carlo

We consider adaptive Markov Chain Monte Carlo methods within the Hamiltonian Monte Carlo (HMC) sampler and its dynamic variant, the No U-Turn Sampler (NUTS). In particular we study strategies for full-rank mass matrix adaptation and make two primary contributions. First, we study regularization strategies for online estimates relating to full-rank mass matrix adaptation in HMC and variants. Second and more importantly, we propose a novel adaptation target for the mass matrix. Contrasting with the usual choice of choosing the mass matrix as the inverse of (an estimate of) the covariance matrix of the target distribution, a global quantity, our alternative proposal is instead an average over local geometric quantities relating to the stability of discretized Hamiltonian dynamics. The proposed target and its estimators are computationally cheap and simple to implement, and our empirical studies show that the proposed adaptation strategies are applicable to challenging problems in hundreds of dimensions. (joint work with M. Vihola)

Junpeng Lao, A Functional Programming Approach to Composable Bayesian Workflow

Abstract: Bayesian modeling in practice is an iterative process, in which a practitioner implicitly or explicitly follows the Bayesian workflow<https://arxiv.org/abs/2011.01808> (Gelman et al 2020) to build models and inferences that are closest to the “reality” within the computational constraints. A composable model building capability is often desired as it makes developing bigger and more complex Bayesian models easier: for example, changing the priors of a collection of random variables. Moreover, a composable approach could enable more flexibility in constructing inferences that optimize for local model structure, thus have the opportunity to improve inference quality overall, as compared to the general inference methods a statistical package usually offers (e.g., NUTS with different schemes of adaptation). In this talk, I will explain how adopting a functional programming perspective benefits the development of composable Bayesian modeling and programmable inference, with example using TensorFlow Probability on JAX<https://www.tensorflow.org/probability/examples/TensorFlow_Probability_on_JAX> (for the modeling part) and Blackjax<https://blackjax-devs.github.io/blackjax/> (for the inference part).

Federica Milinanni, Large Deviation Principle for the Metropolis-Hastings algorithm

For MCMC methods, good performance measures for the convergence of the underlying Markov chains are essential. For instance, such performance measures can be used to compare different MCMC methods, or to tune parameters within a given method. Examples of common tools for investigating convergence properties include the spectral gap, mixing times and functional inequalities (Poincaré, log-Sobolev). In recent years there has been an interest in studying the performance of MCMC methods using tools from large deviation theory, specifically the rate function associated with the empirical measure of an MCMC method. In this talk we analyze the standard Metropolis-Hastings (MH) algorithm from this perspective. We consider the MH algorithm for a target measure defined on a Polish space. We state a large deviation principle for the corresponding empirical measure, generalising previous results for the MH algorithm on a finite state space, and we illustrate in some examples how the rate function depends on parameters of the method (in particular, parameters in the proposal distribution).

Umberto Picchini, Guided sequential ABC schemes for intractable Bayesian models

Sequential algorithms such as sequential importance sampling (SIS) and sequential Monte Carlo (SMC) have proven fundamental in Bayesian inference. However, probabilistic models often do not admit a readily available likelihood function or one that is computationally cheap to approximate. In the last 20 years, simulation-based approaches have flourished to bypass the likelihood intractability by implicitly making use of it via model simulations. The most studied class of simulation-based inference methods is arguably approximate Bayesian computation (ABC). For ABC, sequential Monte Carlo (SMC-ABC) is the state-of-art sampler. However, since the ABC paradigm is intrinsically wasteful, sequential ABC schemes can benefit from well-targeted proposal samplers that efficiently avoid improbable parameter regions. We construct novel proposal samplers that are conditional to summary statistics of the data. In a sense, the proposed parameters are "guided" to rapidly reach regions of the posterior surface that are compatible with the observed data. This speeds up the convergence of these sequential samplers, thus reducing the computational effort, while preserving the accuracy in the inference. We provide a variety of guided samplers easing inference for challenging case-studies, including multimodal posteriors, highly correlated posteriors, hierarchical models with high-dimensional summary statistics. Joint work with Massimiliano Tamborrino available at https://arxiv.org/abs/2206.12235

Giorgos Vasdekis, Pseudo-marginal Piecewise Deterministic Monte Carlo

Piecewise Deterministic Markov Processes (PDMPs) have recently caught the attention of the MCMC community for having a non-diffusive behavior, potentially allowing them to explore the state space efficiently. This makes them good candidates to generate MCMC algorithms. One important problem in Bayesian computation is inference for models where pointwise evaluation of the posterior is not available, but one has access to an unbiased estimator of the posterior. A technique to deal with this problem is the Pseudo-marginal Metropolis Hastings algorithm. In this talk we describe a PDMP algorithm that can be used in the same posterior free setting and can be seen as the analogue of Pseudo-marginal for Piecewise Deterministic Monte Carlo. We show that the algorithm targets the posterior of interest. We also provide some numerical examples, focusing on the case of Approximate Bayesian Computation (ABC), a popular method to deal with problems in the setting of likelihood free inference.

Matti Vihola, Conditional particle filters with bridge backward sampling

The performance of the conditional particle filter (CPF) with backward sampling is often impressive even with long data records. Two known exceptions are when the observations are weakly informative and when the dynamic model is slowly mixing. These are both present when sampling from finely time-discretised continuous-time path integral models, but can occur with hidden Markov models too. Multinomial resampling, which is commonly employed in the (backward sampling) CPF, resamples excessively for weakly informative observations and thereby introduces extra variance. A slowly mixing dynamic model renders the backward sampling step ineffective. We detail two conditional resampling strategies suitable for the weakly informative regime: the so-called `killing' resampling and the systematic resampling with mean partial order. To avoid the degeneracy issue of backward sampling, we introduce a generalisation that involves backward sampling with an auxiliary `bridging' CPF step, which is parameterised by a blocking sequence. We present practical tuning strategies for choosing an appropriate blocking. Our experiments demonstrate that the CPF with a suitable resampling and the developed `bridge backward sampling' can lead to substantial efficiency gains in the weakly informative regime.

Yuexi Wang, Semiparametric Bayesian Bootstrap

Modeling individual heterogeneity has always been one of the central topics of applied research in economics and social sciences. Advances in deep learning make it possible to recast the parameters as fully flexible nonparametric functions. While previous work by Farrell et al. (2020) has illustrated the success of deep learning in structured modeling of heterogeneity, performing statistical inference on the estimated parameter functions remains challenging. We utilize the Bayesian bootstrap (BB) framework, which passes random bootstrap weights to loss functions. To avoid repeatedly re-fitting neural networks, we adopt a semi-parametric linear approximation to the dependence of the parameter functions on the weights. Once the vanilla network is trained, the approximated bootstrap samples can be obtained with negligible costs. Under mild regularity conditions, we show our approximation consistently estimates the Bayesian bootstrap posterior. We illustrate the performance of our method on both simulated and real datasets.

Abstract booklet