0
$\begingroup$

We know optimization techniques search in the space of all the possible parameters for a parameter set that minimizes the cost function of the model. The most well-known loss functions, like MSE or Categorical Cross Entropy, has a global minimum value equal to zero, in the ideal case.

For example, the Gradient Descent, $\theta_j \leftarrow \theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta)$, updates parameters based on the derivation of the calculated cost function value, $J(\theta)$.

I was wondering what will happen if we design a cost function that has a non-zero global minimum in its ideal case. Does it make a difference, e.g. in the convergence rate or other aspects of the optimization process, or not?

$\endgroup$

1 Answer 1

4
$\begingroup$

Saying that the well-known loss functions, like MSE or Categorical Cross Entropy, has a global minimum value equal to zero is flawed . The idea behind loss function is to measure how near the model predictions are to the actuals(in case of a regression). Now ideally , you would want your model to predict exactly equal to the actuals . In that case only , we get loss equal to zero. Otherwise , loss is non zero almost all the time . If you remember the loss function for a linear regression setting , Loss Function for Linear Re

We need to minimise so that the predictions can be as close to the actuals as possible . For that the derivative of should be zero . It doesn't matter if is zero or non zero . Graphically , for a cost function like this enter image description here , you want to reach the point where the derivative is zero.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.