Consider a general $C^1$, non-negative objective function $L(\theta)$. Use GF to minimize $L$. Clearly, $L(\theta_t)$ converges to a finte value.
Now, it is true that the velocity $\nabla L(\theta_t)$ converges to zero, i.e., the trajectory converges to the set of critical points.
Proof:
\[L_T-L_0 = \int_0^T \partial_t L_t dt = \int_0^T \langle \nabla L, \partial_t \theta_t\rangle dt = \int_0^T -\|\partial_t \theta_t\|^2 dt = \int_0^T -\|\nabla L_t\|^2 dt\]converges to a finite value. So, $|\nabla L_t|$ converges to zero.
However, it is not always true that $\theta_t$ converges to a fixed point, which is equivalent to the trajectory length
\[\int_0^T \| \partial_t \theta_t \| dt\]converges to a finite value.
An illustrative example: $\theta_t=\sin(\log t)$. Since $\dot{\theta}_t=\cos(\log t) \cdot \frac{1}{t}$, the velocity converges to zero. But the location does not.
Categories:
Learning
Previous:
Compactness in subspace topology