TY - JOUR
T1 - On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the Polyak–Łojasiewicz inequality
AU - Xia, Lu
AU - Massei, Stefano
AU - Hochstenbach, Michiel E.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/4
Y1 - 2025/4
N2 - In the training of neural networks with low-precision computation and fixed-point arithmetic, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers. This study provides insights into the choice of appropriate stochastic rounding strategies to mitigate the adverse impact of roundoff errors on the convergence of the gradient descent method, for problems satisfying the Polyak–Łojasiewicz inequality. Within this context, we show that a biased stochastic rounding strategy may be even beneficial in so far as it eliminates the vanishing gradient problem and forces the expected roundoff error in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point arithmetic.
AB - In the training of neural networks with low-precision computation and fixed-point arithmetic, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers. This study provides insights into the choice of appropriate stochastic rounding strategies to mitigate the adverse impact of roundoff errors on the convergence of the gradient descent method, for problems satisfying the Polyak–Łojasiewicz inequality. Within this context, we show that a biased stochastic rounding strategy may be even beneficial in so far as it eliminates the vanishing gradient problem and forces the expected roundoff error in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point arithmetic.
KW - Fixed-point arithmetic
KW - Gradient descent
KW - Low-precision
KW - Polyak–Łojasiewicz inequality
KW - Rounding error analysis
KW - Stochastic rounding
UR - http://www.scopus.com/inward/record.url?scp=85217779809&partnerID=8YFLogxK
U2 - 10.1007/s10589-025-00656-1
DO - 10.1007/s10589-025-00656-1
M3 - Article
AN - SCOPUS:85217779809
SN - 0926-6003
VL - 90
SP - 753
EP - 799
JO - Computational Optimization and Applications
JF - Computational Optimization and Applications
IS - 3
ER -