On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the Polyak–Łojasiewicz inequality

Lu Xia (Corresponding author), Stefano Massei, Michiel E. Hochstenbach

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

Samenvatting

In the training of neural networks with low-precision computation and fixed-point arithmetic, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers. This study provides insights into the choice of appropriate stochastic rounding strategies to mitigate the adverse impact of roundoff errors on the convergence of the gradient descent method, for problems satisfying the Polyak–Łojasiewicz inequality. Within this context, we show that a biased stochastic rounding strategy may be even beneficial in so far as it eliminates the vanishing gradient problem and forces the expected roundoff error in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point arithmetic.

Originele taal-2Engels
Pagina's (van-tot)753-799
Aantal pagina's47
TijdschriftComputational Optimization and Applications
Volume90
Nummer van het tijdschrift3
Vroegere onlinedatum11 feb. 2025
DOI's
StatusGepubliceerd - apr. 2025

Bibliografische nota

Publisher Copyright:
© The Author(s) 2025.

Vingerafdruk

Duik in de onderzoeksthema's van 'On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the Polyak–Łojasiewicz inequality'. Samen vormen ze een unieke vingerafdruk.

Citeer dit