Skip to main content

Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

5 votes
2 answers
123 views

From wikipedia: Finding a maximum likelihood solution typically requires taking the derivatives of the likelihood function with respect to all the unknown values, the parameters and the latent ...
FluidMechanics Potential Flows's user avatar
5 votes
1 answer
235 views

What is the pros and cons of using XGBoost VS GBR (scikit-learn) when dealing with data 500<records<1000 and about 5 columns?
Ocean's user avatar
  • 427
3 votes
1 answer
78 views

I've come across a loss regularizing function that uses population counts (i.e., bits that are one, Hamming weight) of activations: $$ L_\mathrm{reg} = H(\max(\lfloor x \rceil, 0)), $$ where $x$ is an ...
Gaslight Deceive Subvert's user avatar
3 votes
0 answers
64 views

I have a complex problem, and I am not sure if I can do it with gradient descent. Most importantly, because I do not know the gradient, it is strongly non-continuous on small steps, and I have no easy ...
peterh's user avatar
  • 145
6 votes
1 answer
187 views

In numerous sources it is said that MAE has a disadvantage of not being differentiable a zero hence it has problems with gradient-based optimization methods. However I've never saw an explanation why ...
Nourless's user avatar
  • 203
6 votes
2 answers
83 views

For a binary classification model, When training a deep model, at each training step, the model receives a batch (i.e batch of size 32 samples). Let's assume that in each training batch there are ...
user3668129's user avatar
2 votes
0 answers
58 views

I'm trying to understand how tolerance check is done in Mini-Batch Gradient Descent. Here are some methods but I'm not sure which one is the most common approach: 1) Begin the epoch Shuffle dataset ...
Guest's user avatar
  • 21
12 votes
4 answers
2k views

This is one of those questions where I know I am wrong, but I don't know how. I understand that when training a neural network, we calculate the derivatives of the loss function with respect to the ...
Leo Juhlin's user avatar
0 votes
0 answers
23 views

I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with L2 regularization. The ...
Paolo Pedinotti's user avatar
2 votes
0 answers
59 views

I'm following Ian Goodfellow et al. book titled Deep Learning, and in Chapter 4 - Numerical Computation, page 87, he mentions that by utilising second order Taylor approximation of the objective ...
Aditya's user avatar
  • 121
2 votes
1 answer
324 views

I'm a data science student, and while I was learning to derive the logistic regression loss function (cross-entropy loss), I found that the gradient is exactly the same as the least-squares gradient ...
Ammar's user avatar
  • 23
3 votes
1 answer
165 views

I'm working on implementing Newton's method to perform second-order gradient descent in a neural network and having trouble computing the second order derivatives. I understand that in practice, ...
bsluther's user avatar
0 votes
1 answer
99 views

I've been trying to figure out why Ridge regression has weights approach 0 for large values of lambda but they are never equal to 0, unlike Lasso and Simple Linear Regression. According to this ...
Rayyan Khan's user avatar
0 votes
1 answer
41 views

i just started with machine learning and today i tried implementing the gradient descent algorithm for linear regression. If i use a bigger value for alpha(the learning rate) the absolute value of w ...
Foch29's user avatar
  • 1
0 votes
1 answer
53 views

I am trying to use gradient descent to minimize a function that takes in multiple vectors, so something like $\min f(x_1, x_2,.., x_N)$ where each $x_i \in \mathbb{R}^3$. and the output is a scalar. I'...
confused's user avatar

15 30 50 per page
1
2 3 4 5
30