The title of the paper is really misleading. The comments here are even more mis...

DoctorOetker · on Nov 13, 2018

but then there is very little of interest: (assuming enough smoothness and Lipschitz continuity) one expects every global minima to have a convex neighbourhood such that gradient descent starting within the neighbourhood reaches the global minimum. The initialize many times and sometimes it converges is just saying the obvious "there exist initial positions for which GD succeeds in finding a global minima"... is my interpretation correct?

nshm · on Nov 13, 2018

The contribution of the paper is the estimation of the convergence speed and number of parameters in neural network, that seems a valuable point.

DoctorOetker · on Nov 13, 2018

The paper states for the main result:

>In this section, we show gradient descent with a constant positive step size converges to the global minimum with a linear rate.

This is rather ambiguous: it sounds like it guarantees "it WILL converge to the global minimum, btw at a linear rate" but I suspect they are really saying "IF it converges to the global minimum, THEN it will converge at a linear rate" in a way to purpousefully sound like the first statement.

could you comment on if GD does or does not find the global minimum of the integer factorization cost function in the following comment?

https://news.ycombinator.com/item?id=18439287

nshm · on Nov 13, 2018

Another sin they have is that they write "with high probability" in the theorem but they do not strictly define that in the main section of the paper. If you look into appendix you'll see that they guarantee the probability 1 - \delta and delta affects the convergence. It means that if you add many many parameters then you just need a couple attempts to converge well. So "IF it converges" is becoming much better "VERY OFTEN it converges". Sorry, no real passion to read the paper in details, so this is just an intuition from a quick look.