CS02: Recent Progress on Stein’s Method
Organizer: Xiao Fang (The Chinese University of Hong Kong)
High-dimensional bootstrap and asymptotic expansion
Yuta Koike
Let \(X_1,\dots,X_n\) be independent centered random vectors in \(\mathbb R^d\) with finite variance. Set \[S_n:=\frac{1}{\sqrt n}\sum_{i=1}^nX_i.\] The aim of this paper is to investigate the accuracy of bootstrap approximation for the maximum type statistic \[T_n:=\max_{1\leq j\leq d}S_{n,j}\] when both \(n\) and \(d\) tend to infinity. Specifically, we consider the so-called wild bootstrap method: Let \(w_1,\dots,w_n\) be i.i.d. random variables independent of the data \(X_1,\dots,X_n\) such that \(E[w_1]=0\) and \(E[w_1^2]=1\). Define the wild bootstrap version of \(S_n\) as follows: \[S_n^*:=\frac{1}{\sqrt n}\sum_{i=1}^nw_i(X_i-\bar X),\quad\text{where }\bar X=\frac{1}{n}\sum_{i=1}^nX_i.\] Given a significance level \(\alpha\in(0, 1)\), let \(\hat c_{1-\alpha}\) be the \((1-\alpha)\)-quantile of the conditional law of \(T_n^*:=\max_{1\leq j\leq d}S^*_{n,j}\) given the data. The main result of this paper is an asymptotic expansion formula for the bootstrap coverage probability \(P(T_n\geq \hat c_{1-\alpha})\). To state the result, we introduce some notation: \(\phi_\Sigma\) is the density of \(N(0,\Sigma)\). \(f_\Sigma\) is the density of \(Z^\vee:=\max_{1\leq j\leq d}Z_j\) with \(Z\sim N(0,\Sigma)\). Also, \(c^G_{1-\alpha}\) is the \((1-\alpha)\)-quantile of \(Z^\vee\). \(\boldsymbol1_d\) is the all-ones vector in \(\mathbb R^d\). \(\overline{X^3}:=n^{-1}\sum_{i=1}^nX_i^{\otimes3}\).
Theorem. Under regularity conditions, \[P(T_n\geq \hat c_{1-\alpha})=\alpha-(1-E[w_1^3])Q_n(c_{1-\alpha}^G)-E[R_n(\alpha)]+O\left(\frac{\log ^3(dn)}{n}\log n\right)\] as \(d,n\to\infty\), where \[\begin{aligned} Q_n(t)&:=-\frac{1}{6\sqrt n}\langle E[\overline{X^{3}}],\int_{(-\infty,t]^d}\nabla^3\phi_\Sigma(z)dz\rangle\quad( t\in\mathbb R),\\ R_n(\alpha)&:=\frac{1}{\sqrt n}\frac{\langle\overline{X^3}\otimes\boldsymbol1_d,\Psi_{\alpha}^{\otimes2}\rangle}{2f_{\Sigma}(c_{1-\alpha}^G)},\qquad \Psi_\alpha:=\int_{(-\infty,c_{1-\alpha}^G]^d}\nabla^2\phi_\Sigma(z)dz, \end{aligned}\] and \(\langle\cdot,\cdot\rangle\) denotes the Euclidean inner product of tensors.
As a corollary, we obtain the following blessing-of-dimensionality type
phenomenon:
Corollary. Under the assumptions of the above theorem, if the
covariance matrix of \(S_n\) has identical diagonal entries and bounded
eigenvalues as \(d,n\to\infty\), then
\[P(T_n\geq \hat c_{1-\alpha})=\alpha+O\left(\frac{\log ^3(dn)}{n}\log n+\sqrt{\frac{\log^3d}{dn}}\right),\]
provided that \(E[w_1^3]=1\).
This result shows that under the stated assumptions on the covariance matrix, the third-moment match wild bootstrap is second-order accurate in the high-dimensional setting such that \(d\gg n\) even when applied to a non-studentized statistic.
The full version of the paper is available at arXiv: https://arxiv.org/abs/2404.05006.
Brownian approximation for deterministic dynamical systems: a Stein’s method approach
Juho Leppänen
The classical functional central limit theorem (FCLT) asserts that, for a sequence of i.i.d. real random variables \((X_i)\) with zero mean and unit variance, the random process \(W_n(t) = n^{-1/2} \sum_{i = 1}^{ \lfloor nt \rfloor } X_i\) converges in distribution to standard Brownian motion \(B(t)\). Using Stein’s method, Barbour \[1\] addressed the problem of estimating \[\begin{aligned} d_{\mathcal{G}}( \mathcal{L}(W_n), \mathcal{L}(B) ) :=\sup_{g \in \mathcal{G}} | \mathbb{E}(g ( W_n ) ) - \mathbb{E} ( g(B) ) |, \end{aligned}\] where \(\mathcal{G}\) is a class of test functions acting on the Skorokhod space of càdlàg functions, such that each \(g \in \mathcal{G}\) is twice Fréchet differentiable with a Lipschitz continuous second derivative. Building on Barbour’s approach, we estimate \(d_{\mathcal{G}}( \mathcal{L}(W_n), \mathcal{L}(B) )\) under appropriate scaling for dynamical processes \(X_i = f \circ T^i\), where \(f\) is a regular observable and \(T: M \to M\) is an expanding/hyperbolic map of a probability space \((M, \mathcal{F}, \mu)\). We derive correlation decay conditions for the FCLT augmented with an explicit upper bound on \(d_{\mathcal{G}}( \mathcal{L}(W_n), \mathcal{L}(B) )\). Examples of systems satisfying these conditions include dispersing Sinai billiards, intermittent maps with critical points and/or singularities, and open dynamical systems composed of piecewise smooth expanding maps. In addition to conventional measure-preserving systems, we also obtain error bounds for nonautonomous systems \(X_n = f_n \circ T_n \circ \cdots \circ T_1\) described by sequential compositions of deterministically/randomly varying hyperbolic maps \(T_n\) and observables \(f_n\). In this talk, I will discuss some of these results, which are part of our joint work \[2,3\] with Yuto Nakajima (Doshisha University) and Yushi Nakano (Hokkaido University).
Bibliography
\([1]\) A.D. Barbour. "Stein’s method for diffusion approximations." Probab. Th. Rel. Fields, vol. 84, no. 3, 1990, pp. 297-332
\([2]\) J. Leppänen, Y. Nakajima, Y. Nakano. "Brownian approximation of dynamical systems by Stein’s method." Preprint available at arXiv:2501.13498
\([3]\) J. Leppänen, Y. Nakajima, Y. Nakano. "Functional correlation bound for random open dynamical systems and its application to normal approximations" Manuscript in preparation.
Normal approximation for exponential random graphs
Xiao Fang
The question of whether the central limit theorem (CLT) holds for the total number of edges in exponential random graph models (ERGMs) in the subcritical region of parameters has remained an open problem. In this paper, we establish the CLT. As a result of our proof, we also derive a convergence rate for the CLT, an explicit formula for the asymptotic variance, and the CLT for general subgraph counts. To establish our main result, we develop Stein’s method for the normal approximation of general functionals of nonlinear exponential families of random variables, which is of independent interest. In addition to ERGMs, our general theorem can also be applied to other models. A key ingredient needed in our proof for the ERGM is a higher-order concentration inequality, which was known in a subset of the subcritical region called Dobrushin’s uniqueness region. We use Stein’s method to partially generalize such inequalities to the subcritical region.