Questions tagged [backpropagation]

Question 1

Specifically to solve the problem of text generation, not translation. There is literally not a single discussion, blog post, or tutorial that explains the math behind this. My best guess so far is: ...

Question 2

I'm starting to create a NEAT algorithm and before starting I looked at a few examples and all of them were using bias values but I actually have no idea why bias is used in a algorithm like NEAT. For ...

Question 3

If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation? Or is ...

Question 4

I'm working on implementing Newton's method to perform second-order gradient descent in a neural network and having trouble computing the second order derivatives. I understand that in practice, ...

Question 5

I'm currently working on deriving the the gradients of a simple recurrent neural networks weights with respect to the loss to update the weights through backpropagation. It's a super simple network, ...

Question 6

$$J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(a^{[L](i)}) + (1-y^{(i)})y^{(i)}log(1-a^{[L](i)})$$ For the last layer, I saw that $$dA^{[L]} = - (\frac{Y}{A^{[L]}} - \frac{1-Y}{1 - A^{[L]}})$$ My ...

Question 7

Given the following network: I'm asked to write the backpropagation process for the $b_3$ parameter, where the loss function is $L(y,z_3)=(z_3-y)^2$ I'm not supposed to calculate any of the weights ...

Question 8

in most cases it is probably the other way round but... I have implemented a basic MLP neural network structure with backpropagation. My data is just a shifted quadratic function with 100 samples. I ...

Question 9

For a simple skip connection $y = x@w + x$, the gradient dy/dx will be $w+1$. $$\frac {\partial y}{\partial x} = w +1$$ Is +1 a bit too large and can it overpower $...

Question 10

I was going through Stanford CS 224 lecture notes on Back propagation. Page 5 states: We can see from the max-margin loss that: ∂J /∂s = − ∂J/∂s(c) = −1 I'm not sure I understand why this is the ...

Question 11

I'm trying to implement RNN and LSTM , many-to-many architecture. I reasoned myself why BPTT is necessary in RNNs and it makes sense. But what doesn't make sense to me is, most of resources I went ...

Question 12

I use Pytorch exclusively to develop my model, and these are components in my model and how it works: A generator An encoder: a pretrained, and should not updated. A loss function. Input is passed to ...

Question 13

From what I read, I know we don't use log loss or cross entropy for regression problems. However, the entire logic behind binary cross entropy(say) is to firstly squeeze the y_hat between 0 and 1 (...

Question 14

The above backpropagation algorithm is taken from Shalev Shwartz and Ben-David's textbook: Understanding Machine Learning. This algorithm is described in the same way as the one in Mostafa's textbook, ...

Question 15

I have a stupid question on the derivative of relu activation function. After the finding the difference of the true output $t_k$ and predicted output $a_k$, why is the value of the $d_{a3}$ \ $d_{z3}$...

Stack Exchange Network

Questions tagged [backpropagation]

How does backpropagation in a transformer work?

Why should I use bias in a NEAT algorithm

Does reducing the loss change the amount of change during backpropagation?

Why are the second-order derivatives of a loss function nonzero when linear combinations are involved?

Deriving the gradient hidden to hidden weights for backpropagation through time in a reccurent neural network

back-propagation calculations notation and the average constant

Backpropgation for a single parameter on a rather simple network

My custom neural network is converging but keras model not

Why no scale parameter for skip connection addition?

CS 224N Back Propagation and Margin Loss in Neural Networks

Why not Back propagate through time in LSTM , similar to RNN

How to prevent update a pretrained model if a model is optimized with backpropagation in Pytorch?

Doubts on a custom loss function for regression problems

Are "textbook backpropagation" still relevant?

Relu derivative value

Hot Network Questions