Questions tagged [gradient]

Question 1

I have a complex problem, and I am not sure if I can do it with gradient descent. Most importantly, because I do not know the gradient, it is strongly non-continuous on small steps, and I have no easy ...

Question 2

I have a feature dataframe that has a shape of (100,18). 18 features for 100 different points. One of those features is time. The model will then output an array with shape of (100,16). The model has ...

Question 3

Currently I am working on a custom fine-tune of several code LLMs and while working on the DeepSeekCoder I encountered a strange behaviour. When training the model earlier or later the loss goes to ...

Question 4

Generally, for any machine learning/deep learning system, we compute a loss, $L = l(x, \theta, y)$ which is a function of the input feature vector $x$ (after activation), model parameters $\theta$ (...

Question 5

I have some questions I don't really understand regarding the Gradient Boosting algorithm with Decision Trees: Does the initial value matter as $\hat{y}$ or could you pick any, f.e between 0 and 1? ...

Question 6

I watched hours of videos on gradient descent and still feel pretty confused. Let's say I have a "model": y = x * w I use 2 as my target ...

Question 7

I have loaded my data into a Pandas DataFrame, and performed some pre-processing, and then I need to convert it into a PyTorch Tensor for training as my features data. Obviously, This new tensor do ...

Question 8

I've been dwelling through RL theory and practice and one particular part I find hard to properly understand is the relation between the practical loss function and ...

Question 9

Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...

Question 10

I'm doing CS 231n on my own. I'm looking at this solution to a question that implements a SVM. Relevant code: ...

Question 11

Suppose that you want to estimate a local maximum of the real function $f(x,y,z)$ with gradient ascent. Given a starting point $(x_0, y_0, z_0)$, the approach is to compute the gradient at this ...

Question 12

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in ...

Question 13

I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: ...

Question 14

I'm asked to compute central finite difference scheme (f(i+1)-f(i-1)) on an image. My attempt is something like: ...

Question 15

I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being ...

Stack Exchange Network

Questions tagged [gradient]

How does gradient descent perform, compared to informed random walk?

Tensorflow tape.gradient to calculate a 2d array with respect to a single column of the 2d array input

NaN grad norm even with a stable loss and gradient

Use of Gradient with respect to feature instead of model parameters

Gradient Boosting - Why pseudo-residuals?

Does learning rate depend on input and output range?

How to correctly create a PyTorch Tensor from a Pandas DataFrame?

How do we derive our loss function from the gradient objective?

calculating derivative of bias in backpropagation

Why would we add regularization loss to the gradient itself in an SVM?

Gradient Ascent and directional derivative

How to interpret integrated gradients in an NLP toxic text classification use-case?

Differentiable approximation for counting negative values in array

Central finite distance gradient simplified [closed]

Which Neural Network or Gradient Boosting framework is the simplest for Custom Loss Functions?

Hot Network Questions