Skip to main content

Questions tagged [backpropagation]

Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.

8 votes
2 answers
2k views

Specifically to solve the problem of text generation, not translation. There is literally not a single discussion, blog post, or tutorial that explains the math behind this. My best guess so far is: ...
Austin Capobianco's user avatar
1 vote
0 answers
52 views

I'm starting to create a NEAT algorithm and before starting I looked at a few examples and all of them were using bias values but I actually have no idea why bias is used in a algorithm like NEAT. For ...
ugroon's user avatar
  • 11
1 vote
1 answer
41 views

If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation? Or is ...
GreedyGroot's user avatar
3 votes
1 answer
165 views

I'm working on implementing Newton's method to perform second-order gradient descent in a neural network and having trouble computing the second order derivatives. I understand that in practice, ...
bsluther's user avatar
2 votes
1 answer
106 views

I'm currently working on deriving the the gradients of a simple recurrent neural networks weights with respect to the loss to update the weights through backpropagation. It's a super simple network, ...
namor129's user avatar
1 vote
1 answer
97 views

$$J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(a^{[L](i)}) + (1-y^{(i)})y^{(i)}log(1-a^{[L](i)})$$ For the last layer, I saw that $$dA^{[L]} = - (\frac{Y}{A^{[L]}} - \frac{1-Y}{1 - A^{[L]}})$$ My ...
Hao Wu's user avatar
  • 13
1 vote
0 answers
50 views

Given the following network: I'm asked to write the backpropagation process for the $b_3$ parameter, where the loss function is $L(y,z_3)=(z_3-y)^2$ I'm not supposed to calculate any of the weights ...
Aishgadol's user avatar
  • 111
0 votes
1 answer
89 views

in most cases it is probably the other way round but... I have implemented a basic MLP neural network structure with backpropagation. My data is just a shifted quadratic function with 100 samples. I ...
tymsoncyferki's user avatar
0 votes
1 answer
38 views

For a simple skip connection $y = x@w + x$, the gradient dy/dx will be $w+1$. $$\frac {\partial y}{\partial x} = w +1$$ Is +1 a bit too large and can it overpower $...
mon's user avatar
  • 829
1 vote
1 answer
90 views

I was going through Stanford CS 224 lecture notes on Back propagation. Page 5 states: We can see from the max-margin loss that: ∂J /∂s = − ∂J/∂s(c) = −1 I'm not sure I understand why this is the ...
Hormigas's user avatar
  • 113
0 votes
1 answer
160 views

I'm trying to implement RNN and LSTM , many-to-many architecture. I reasoned myself why BPTT is necessary in RNNs and it makes sense. But what doesn't make sense to me is, most of resources I went ...
Amith Adiraju's user avatar
0 votes
1 answer
59 views

I use Pytorch exclusively to develop my model, and these are components in my model and how it works: A generator An encoder: a pretrained, and should not updated. A loss function. Input is passed to ...
Jesse's user avatar
  • 101
0 votes
0 answers
106 views

From what I read, I know we don't use log loss or cross entropy for regression problems. However, the entire logic behind binary cross entropy(say) is to firstly squeeze the y_hat between 0 and 1 (...
the_he_man's user avatar
1 vote
0 answers
28 views

The above backpropagation algorithm is taken from Shalev Shwartz and Ben-David's textbook: Understanding Machine Learning. This algorithm is described in the same way as the one in Mostafa's textbook, ...
Fraïssé's user avatar
  • 119
1 vote
0 answers
72 views

I have a stupid question on the derivative of relu activation function. After the finding the difference of the true output $t_k$ and predicted output $a_k$, why is the value of the $d_{a3}$ \ $d_{z3}$...
Gunners 's user avatar

15 30 50 per page
1
2 3 4 5
21