Questions tagged [backpropagation]
Backpropagation or "backward propagation of errors," is an algorithm for supervised learning of artificial neural networks using gradient descent.
31 questions
2 votes
2 answers
222 views
Machine learning: what is the proper name for derivative of a function against a matrix?
In machine learning, it is typical to see a so-called weight matrix. As a low-dimensional example, let this matrix be defined as, $$W = \begin{bmatrix} w_{11} & w_{12} \\\ w_{21} & w_{22} \end{...
0 votes
0 answers
71 views
Looking for a gradient in Batch Normalization
I'm studying Batch Normalization inside a neural network where the output is $$ y_i = \gamma \hat{x}_i + \beta, $$ with $$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma}}, $$ and $$ \mu = \frac{1}{m} \...
0 votes
0 answers
133 views
Mathematical proof of the cheap gradient principle.
In A. Griewank's paper he asserts that the algorithm reverse mode automatic differentiation can evaluate the gradient of a function $f$ at a cost of no more than five times the cost of evaluating $f$. ...
0 votes
0 answers
62 views
Alternative Renormalization for Matrix Exponential in Causal Lower Triangular Matrices?
I'm experimenting with the matrix exponential $$ exp(L) = \sum_{k=0}^{\infty} \frac{L^k}{k!}, $$ where ( L ) is a lower triangular matrix that naturally encodes a causal structure (as seen in ...
0 votes
2 answers
121 views
Derivative with respect to a matrix and the linearization of a function of matrices.
I'm studying backpropagation and am trying to wrap my head around the idea of a derivative with respect to a matrix. Suppose we have a vector function $f: \mathbb{R^m} \to \mathbb{R}$. Then we can ...
0 votes
1 answer
121 views
backpropagation computation - Derivative of matrix with respect to another matrix
I am having some trouble computing gradients of the loss function of an MLP when the input is a minibatch of vectors. Forward propagation $\large \underbrace{Z^{[1]}}_{(n^{[1]},m)} = \underbrace{W^{[...
1 vote
1 answer
387 views
Derivative of the Cross Entropy loss function with the Softmax function
I am currently teaching myself the basics of neural networks and backpropagation but some steps regarding the derivation of the derivative of the Cross Entropy loss function with the Softmax ...
1 vote
1 answer
160 views
Why is the numerator-layout Jacobian transposed in backpropagation calculation?
In the derivation of the backpropagation algorithm in Neural Network Design by Hagan et al., we consider the derivative of the scalar-valued sample loss function $\hat{F}$ with respect to a vector of &...
0 votes
1 answer
88 views
Gradients in "On the difficulty of training Recurrent Neural Networks"
In the paper we have $$ x_t=F(x_{t-1},u_t,\theta) \\ x_t=W_{rec}\sigma(x_{t-1})+W_{in}u_t+b $$ and then some error function $\varepsilon$, and we are interesting in taking the derivative w.R.t $\theta$...
0 votes
0 answers
202 views
Backpropagation: Chain Rule for Matrix Exponential?
Recent linear state-space model papers like Mamba often use matrix exponential to discretize the system. They initialize the system in a continuous-time regime, and discretize it to run it like a ...
0 votes
0 answers
97 views
Calculate derivative in the context of backpropagation
I have received the following problem: Concider the following simple model of a neuron z = wx + b logits, yˆ = g(z) activation, L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 ...
2 votes
0 answers
137 views
What do I do once I have the Jacobian Matrix from Softmax Derivative
I am teaching myself Artificial Intelligence from scratch without libraries I have a decent handle on most of it UPDATE-EDIT I am lost however on the next step mathematically after deriving the ...
0 votes
1 answer
915 views
Why does the transpose of Jacobian appears during backpropagation?
I am working on the backpropation through fully-connected layers, suppose this architecture: My ultimate goal is to find the gradient of $\overrightarrow{a}$ respect to the loss function $C$, given ...
2 votes
2 answers
241 views
Partial derivative with respect to a matrix in RNN backpropagation
I have an issue with the following problem. I am trying to derive the gradients with respect to $x_t, h_{t-1}, W_x, W_h$. $x_t$ is a $N*D$ vector. $h_t$ is a $N*H$ vector. $W_h$ is a $H*H$ matrix. $...
0 votes
1 answer
125 views
Backpropagation Hidden Layer Error
I'm trying to understand the maths behind backpropagation using this book. I have looked at the formulae the backprop algorithm uses and have worked through their proofs; however, I was wondering ...