IS17: Applications of stochastic analysis to deep learning

date: 7/17/2025, time: 14:00-15:30, room: IM HS

Organizer: Eulalia Nualart (Pompeu Fabra University)

Chair: Eulalia Nualart (Pompeu Fabra University)

Benign overfitting

Peter Bartlett (Google DeepMind and UC Berkeley)

Deep learning has revealed some major surprises from the perspective of statistical complexity: even without any explicit effort to control model complexity, these methods find prediction rules that give a near-perfect fit to noisy training data and yet exhibit excellent prediction performance in practice. This talk reviews recent work on methods that predict accurately in probabilistic settings despite fitting too well to training data, showing the role of overparameterization in regression and classification problems.

Based on joint work with Phil Long, Gabor Lugosi, Alex Tsigler, Niladri Chatterji, Spencer Frei, Wei Hu, Nati Srebro, and Gal Vardi.

The Proportional Scaling Limit of Neural Networks

Mufan Li (University of Waterloo)

Recent advances in deep learning performance have all relied on scaling up the number of parameters within neural networks, consequently making asymptotic scaling limits a compelling approach to theoretical analysis. In this talk, we explore the proportional infinite-depth-and-width limit, where the role of depth can be adequately studied, and the limit remains a great model of finite size networks. At initialization, we characterize the limiting distribution of the network via a stochastic differential equation (SDE) for the feature covariance matrix. Furthermore, in the linear network setting, we can also characterize the spectrum of the covariance matrix in the large data limit via a geometric variant of Dyson Brownian motions. Finally, we will briefly discuss ongoing work towards analyzing training dynamics.

Estimation of error in diffusion models in machine learning

Anna Kazeykina (University Paris Saclay)

Stochastic diffusion processes in machine learning allow to model both the training dynamics and the architecture of neural networks. We will explore several important examples of diffusion processes in ML: the mean-field Langevin diffusion, which arises from the gradient flow associated with training two-layer neural network, the mean-field Schrödinger dynamics associated with the optimisation problem regularised by Fischer information and the diffusion model used for score-matching generative modelling. We will present results on the analysis of errors that emerge due to the limited size of training datasets, the finite number of neurons and the time-discretisation schemes.