Skip to main content

Questions tagged [gradient]

3 votes
0 answers
64 views

I have a complex problem, and I am not sure if I can do it with gradient descent. Most importantly, because I do not know the gradient, it is strongly non-continuous on small steps, and I have no easy ...
peterh's user avatar
  • 145
0 votes
0 answers
36 views

I have a feature dataframe that has a shape of (100,18). 18 features for 100 different points. One of those features is time. The model will then output an array with shape of (100,16). The model has ...
twofair's user avatar
0 votes
0 answers
810 views

Currently I am working on a custom fine-tune of several code LLMs and while working on the DeepSeekCoder I encountered a strange behaviour. When training the model earlier or later the loss goes to ...
weda's user avatar
  • 1
1 vote
2 answers
336 views

Generally, for any machine learning/deep learning system, we compute a loss, $L = l(x, \theta, y)$ which is a function of the input feature vector $x$ (after activation), model parameters $\theta$ (...
OlorinIstari's user avatar
1 vote
1 answer
953 views

I have some questions I don't really understand regarding the Gradient Boosting algorithm with Decision Trees: Does the initial value matter as $\hat{y}$ or could you pick any, f.e between 0 and 1? ...
CMath's user avatar
  • 21
0 votes
1 answer
74 views

I watched hours of videos on gradient descent and still feel pretty confused. Let's say I have a "model": y = x * w I use 2 as my target ...
Eugene's user avatar
  • 103
0 votes
0 answers
3k views

I have loaded my data into a Pandas DataFrame, and performed some pre-processing, and then I need to convert it into a PyTorch Tensor for training as my features data. Obviously, This new tensor do ...
EvilRoach's user avatar
  • 163
1 vote
1 answer
100 views

I've been dwelling through RL theory and practice and one particular part I find hard to properly understand is the relation between the practical loss function and ...
Alex Ramalho's user avatar
0 votes
0 answers
178 views

Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...
Ariel Yael's user avatar
0 votes
1 answer
790 views

I'm doing CS 231n on my own. I'm looking at this solution to a question that implements a SVM. Relevant code: ...
Foobar's user avatar
  • 135
1 vote
1 answer
41 views

Suppose that you want to estimate a local maximum of the real function $f(x,y,z)$ with gradient ascent. Given a starting point $(x_0, y_0, z_0)$, the approach is to compute the gradient at this ...
Enk9456's user avatar
  • 135
1 vote
0 answers
64 views

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in ...
Revolucion for Monica's user avatar
4 votes
1 answer
377 views

I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: ...
iRestMyCaseYourHonor's user avatar
1 vote
0 answers
24 views

I'm asked to compute central finite difference scheme (f(i+1)-f(i-1)) on an image. My attempt is something like: ...
Anđela Todorović's user avatar
1 vote
0 answers
89 views

I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being ...
Borut Flis's user avatar

15 30 50 per page