Questions tagged [gradient]
The gradient tag has no summary.
36 questions
3 votes
0 answers
64 views
How does gradient descent perform, compared to informed random walk?
I have a complex problem, and I am not sure if I can do it with gradient descent. Most importantly, because I do not know the gradient, it is strongly non-continuous on small steps, and I have no easy ...
0 votes
0 answers
36 views
Tensorflow tape.gradient to calculate a 2d array with respect to a single column of the 2d array input
I have a feature dataframe that has a shape of (100,18). 18 features for 100 different points. One of those features is time. The model will then output an array with shape of (100,16). The model has ...
0 votes
0 answers
810 views
NaN grad norm even with a stable loss and gradient
Currently I am working on a custom fine-tune of several code LLMs and while working on the DeepSeekCoder I encountered a strange behaviour. When training the model earlier or later the loss goes to ...
1 vote
2 answers
336 views
Use of Gradient with respect to feature instead of model parameters
Generally, for any machine learning/deep learning system, we compute a loss, $L = l(x, \theta, y)$ which is a function of the input feature vector $x$ (after activation), model parameters $\theta$ (...
1 vote
1 answer
953 views
Gradient Boosting - Why pseudo-residuals?
I have some questions I don't really understand regarding the Gradient Boosting algorithm with Decision Trees: Does the initial value matter as $\hat{y}$ or could you pick any, f.e between 0 and 1? ...
0 votes
1 answer
74 views
Does learning rate depend on input and output range?
I watched hours of videos on gradient descent and still feel pretty confused. Let's say I have a "model": y = x * w I use 2 as my target ...
0 votes
0 answers
3k views
How to correctly create a PyTorch Tensor from a Pandas DataFrame?
I have loaded my data into a Pandas DataFrame, and performed some pre-processing, and then I need to convert it into a PyTorch Tensor for training as my features data. Obviously, This new tensor do ...
1 vote
1 answer
100 views
How do we derive our loss function from the gradient objective?
I've been dwelling through RL theory and practice and one particular part I find hard to properly understand is the relation between the practical loss function and ...
0 votes
0 answers
178 views
calculating derivative of bias in backpropagation
Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...
0 votes
1 answer
790 views
Why would we add regularization loss to the gradient itself in an SVM?
I'm doing CS 231n on my own. I'm looking at this solution to a question that implements a SVM. Relevant code: ...
1 vote
1 answer
41 views
Gradient Ascent and directional derivative
Suppose that you want to estimate a local maximum of the real function $f(x,y,z)$ with gradient ascent. Given a starting point $(x_0, y_0, z_0)$, the approach is to compute the gradient at this ...
1 vote
0 answers
64 views
How to interpret integrated gradients in an NLP toxic text classification use-case?
I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in ...
4 votes
1 answer
377 views
Differentiable approximation for counting negative values in array
I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: ...
1 vote
0 answers
24 views
Central finite distance gradient simplified [closed]
I'm asked to compute central finite difference scheme (f(i+1)-f(i-1)) on an image. My attempt is something like: ...
1 vote
0 answers
89 views
Which Neural Network or Gradient Boosting framework is the simplest for Custom Loss Functions?
I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being ...