Abstract
In this thesis we study the behavior of various models with a gradient flow structure which arise in machine learning applications. In particular, we focus on two main applications: stochastic gradient descent algorithm and the toy model of transformers. In the first part, we study the limiting dynamics of a large class of noisy gradient descent systems in the overparameterized regime. In this regime the set of global minimizers of the loss is large, and when initialized in a neighbourhood of this zero-loss set a noisy gradient descent algorithm slowly evolves along this set. We characterize this evolution for the broad class of noisy gradient descent systems in the limit of small step size and small noise. In particular, we show that the structure of the noise affects not just the form of the limiting process, but also the time scale at which the evolution takes place. We apply the theory to Dropout, label noise and classical SGD (minibatching) noise and compare the resulting dynamics. In the second part, we study partial differential equations of the aggregation type which arise as simplified model of the self-attention based models (transformers). First we study the McKean-Vlasov equation on compact Riemannian manifolds and on the high-dimensional sphere in particular. We characterize minimizers and other critical points of the corresponding free energy functional depending on the interaction kernel and the inverse temperature parameter. Using the properties of the Lebesgue space on a high-dimensional sphere, we characterize bifurcations and phase transitions of the model. In the last chapter of the thesis we study an aggregation PDE on a high-dimensional sphere with competing attractive and repulsive forces. We consider the limit of localized repulsion and establish convergence of the solutions of the attraction-repulsion model to the solutions of the aggregation equation with the porous-medium-type diffusion.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 12 Sept 2025 |
| Place of Publication | Eindhoven |
| Publisher | |
| Print ISBNs | 978-90-386-6435-4 |
| Publication status | Published - 12 Sept 2025 |
Bibliographical note
Proefschrift.Fingerprint
Dive into the research topics of 'Noisy gradient flows: with applications in machine learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver