Beyond Gaussian Approximation: Bootstrap in Large Scale Simultaneous Inference
The Bonferroni adjustment, or the union bound, is commonly used to study rate optimality properties of statistical methods in high-dimensional problems. However, in practice, the Bonferroni adjustment is overly conservative. The extreme value theory has been proven to provide more accurate multiplicity adjustments in a number of settings, but only on ad hoc bases. Recently, Gaussian approximation was used to justify bootstrap adjustments in large scale simultaneous inference in some general settings where the multiplicity of the inference problem, p, is allowed to be of greater order than the sample size n. The thrust of this theory is the validity of the Gaussian approximation for maxima of sums of independent random vectors in high-dimension. We reduce the sample size requirement to the five seventh of the existing theory for the consistency of the empirical bootstrap and the multiplier/wild bootstrap in the Kolmogorov-Smirnov distance, and to the usual \(n \gg \log(p)\) for certain approximately optimal bootstrap adjustments. New comparison and anti-concentration theorems, which are of considerable interest in and of themselves, are developed as existing ones interweaved with Gaussian approximation are no longer applicable.
Download the slides
The Langevin MCMC: theory and methods.
In machine learning literature, a large number of problems amount to
simulate a density which is log-concave (at least in the tails) and perhaps
non smooth. Most of the research efforts so far has been devoted to the
Maximum A posteriori problem, which amounts to solve a high-dimensional
convex (perhaps non smooth) program. The purpose of this lecture is to
understand how we can use ideas which have proven very useful in machine
learning community to solve large scale optimization problems to design
efficient sampling algorithms, with convergence guarantees (and possibly usable convergence bounds).
In high dimension, only first order method (exploiting exclusively gradient
information) is a must. Most of the efficient algorithms know so far may be
seen as variants of the gradient descent algorithms, most often coupled with
partial updates (coordinates descent algorithms). This of course
suggests to study methods derived from Euler discretization of the Langevin
diffusion, which may be seen as a noisy version of the gradient descent.
Partial updates may in this context as Gibbs steps where some components
are frozen. This algorithm may be generalized in the non-smooth case by regularizing the objective function. The Moreau-Yosida inf-convolution
algorithm is an appropriate candidate in such case, because it does not
modify the minimum value of the criterion while transforming a non smooth
optimization problem in a smooth one. We will prove convergence results for
these algorithms with explicit convergence bounds both in Wasserstein
distance and in total variation.
Joint work with Alain Durmus.
Download the slides