In this video, the professor describes an algorithm that can be used to find the minimum value of the cost function for linear regression. Here, the cost function is $f$, the gradient is $g_k$ where $k$ is the $kth$ step of the algorithm, $\theta$ is the parameters we want to find to optimize the problem, $d_k$ is the value used to update $\theta$. Here is a screenshot of the slide for reference:
Feel free to scroll down near the end for a slide describing what the Newton's algorithm is doing in more detail.
My confusion comes from line 6 of the algorithm, the one about the line search. From my understanding of his explanation, the idea is that you increase the value of $\eta_k$ and each time you increase it, you compute your cost function $f$. The moment you get to the minimum, you stop and you use that $\eta_k$. I think this $\eta_k$ is essentially the learning rate you need to immediately go to the minimum.
But then if that is the case, why would you need any iterations? Wouldn't the linear search mean that after one step of the algorithm you're at the minimum?
Second Question
I have another question that I would love to have answered if possible. So in the previous slide, the professor shows that for the newton's algorithm for linear regression, the $\theta$ after one step is equal to the solution you get from the method of least squares in matrix form. In other words, he says that you only need one step of the algorithm to get the optimal $\theta$. If this is the case, what is the point of showing us the iterative algorithm in the slide above? Is it because the matrix inverse is computationally expensive? The relevant slide for this question is below:
And for those who are interested in what the Newton's algorithm is:


