Questions tagged [backpropagation]

Question 1

In machine learning, it is typical to see a so-called weight matrix. As a low-dimensional example, let this matrix be defined as, $$W = \begin{bmatrix} w_{11} & w_{12} \\\ w_{21} & w_{22} \end{...

Question 2

I'm studying Batch Normalization inside a neural network where the output is $$ y_i = \gamma \hat{x}_i + \beta, $$ with $$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma}}, $$ and $$ \mu = \frac{1}{m} \...

Question 3

In A. Griewank's paper he asserts that the algorithm reverse mode automatic differentiation can evaluate the gradient of a function $f$ at a cost of no more than five times the cost of evaluating $f$. ...

Question 4

I'm experimenting with the matrix exponential $$ exp(L) = \sum_{k=0}^{\infty} \frac{L^k}{k!}, $$ where ( L ) is a lower triangular matrix that naturally encodes a causal structure (as seen in ...

Question 5

I'm studying backpropagation and am trying to wrap my head around the idea of a derivative with respect to a matrix. Suppose we have a vector function $f: \mathbb{R^m} \to \mathbb{R}$. Then we can ...

Question 6

I am having some trouble computing gradients of the loss function of an MLP when the input is a minibatch of vectors. Forward propagation $\large \underbrace{Z^{[1]}}_{(n^{[1]},m)} = \underbrace{W^{[...

Question 7

I am currently teaching myself the basics of neural networks and backpropagation but some steps regarding the derivation of the derivative of the Cross Entropy loss function with the Softmax ...

Question 8

In the derivation of the backpropagation algorithm in Neural Network Design by Hagan et al., we consider the derivative of the scalar-valued sample loss function $\hat{F}$ with respect to a vector of &...

Question 9

In the paper we have $$ x_t=F(x_{t-1},u_t,\theta) \\ x_t=W_{rec}\sigma(x_{t-1})+W_{in}u_t+b $$ and then some error function $\varepsilon$, and we are interesting in taking the derivative w.R.t $\theta$...

Question 10

Recent linear state-space model papers like Mamba often use matrix exponential to discretize the system. They initialize the system in a continuous-time regime, and discretize it to run it like a ...

Question 11

I have received the following problem: Concider the following simple model of a neuron z = wx + b logits, yˆ = g(z) activation, L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 ...

Question 12

I am teaching myself Artificial Intelligence from scratch without libraries I have a decent handle on most of it UPDATE-EDIT I am lost however on the next step mathematically after deriving the ...

Question 13

I am working on the backpropation through fully-connected layers, suppose this architecture: My ultimate goal is to find the gradient of $\overrightarrow{a}$ respect to the loss function $C$, given ...

Question 14

I have an issue with the following problem. I am trying to derive the gradients with respect to $x_t, h_{t-1}, W_x, W_h$. $x_t$ is a $N*D$ vector. $h_t$ is a $N*H$ vector. $W_h$ is a $H*H$ matrix. $...

Question 15

I'm trying to understand the maths behind backpropagation using this book. I have looked at the formulae the backprop algorithm uses and have worked through their proofs; however, I was wondering ...

Stack Exchange Network

Questions tagged [backpropagation]

Machine learning: what is the proper name for derivative of a function against a matrix?

Looking for a gradient in Batch Normalization

Mathematical proof of the cheap gradient principle.

Alternative Renormalization for Matrix Exponential in Causal Lower Triangular Matrices?

Derivative with respect to a matrix and the linearization of a function of matrices.

backpropagation computation - Derivative of matrix with respect to another matrix

Derivative of the Cross Entropy loss function with the Softmax function

Why is the numerator-layout Jacobian transposed in backpropagation calculation?

Gradients in "On the difficulty of training Recurrent Neural Networks"

Backpropagation: Chain Rule for Matrix Exponential?

Calculate derivative in the context of backpropagation

What do I do once I have the Jacobian Matrix from Softmax Derivative

Why does the transpose of Jacobian appears during backpropagation?

Partial derivative with respect to a matrix in RNN backpropagation

Backpropagation Hidden Layer Error

Hot Network Questions