Skip to main content

Questions tagged [backpropagation]

Backpropagation or "backward propagation of errors," is an algorithm for supervised learning of artificial neural networks using gradient descent.

2 votes
2 answers
222 views

In machine learning, it is typical to see a so-called weight matrix. As a low-dimensional example, let this matrix be defined as, $$W = \begin{bmatrix} w_{11} & w_{12} \\\ w_{21} & w_{22} \end{...
Your neighbor Todorovich's user avatar
0 votes
0 answers
71 views

I'm studying Batch Normalization inside a neural network where the output is $$ y_i = \gamma \hat{x}_i + \beta, $$ with $$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma}}, $$ and $$ \mu = \frac{1}{m} \...
kklaw's user avatar
  • 311
0 votes
0 answers
133 views

In A. Griewank's paper he asserts that the algorithm reverse mode automatic differentiation can evaluate the gradient of a function $f$ at a cost of no more than five times the cost of evaluating $f$. ...
beaver's user avatar
  • 9
0 votes
0 answers
62 views

I'm experimenting with the matrix exponential $$ exp(L) = \sum_{k=0}^{\infty} \frac{L^k}{k!}, $$ where ( L ) is a lower triangular matrix that naturally encodes a causal structure (as seen in ...
jeroaranda's user avatar
0 votes
2 answers
121 views

I'm studying backpropagation and am trying to wrap my head around the idea of a derivative with respect to a matrix. Suppose we have a vector function $f: \mathbb{R^m} \to \mathbb{R}$. Then we can ...
John Hippisley's user avatar
0 votes
1 answer
121 views

I am having some trouble computing gradients of the loss function of an MLP when the input is a minibatch of vectors. Forward propagation $\large \underbrace{Z^{[1]}}_{(n^{[1]},m)} = \underbrace{W^{[...
yosh's user avatar
  • 73
1 vote
1 answer
387 views

I am currently teaching myself the basics of neural networks and backpropagation but some steps regarding the derivation of the derivative of the Cross Entropy loss function with the Softmax ...
Fynn Z.'s user avatar
  • 65
1 vote
1 answer
160 views

In the derivation of the backpropagation algorithm in Neural Network Design by Hagan et al., we consider the derivative of the scalar-valued sample loss function $\hat{F}$ with respect to a vector of &...
aas's user avatar
  • 11
0 votes
1 answer
88 views

In the paper we have $$ x_t=F(x_{t-1},u_t,\theta) \\ x_t=W_{rec}\sigma(x_{t-1})+W_{in}u_t+b $$ and then some error function $\varepsilon$, and we are interesting in taking the derivative w.R.t $\theta$...
greedsin's user avatar
  • 591
0 votes
0 answers
202 views

Recent linear state-space model papers like Mamba often use matrix exponential to discretize the system. They initialize the system in a continuous-time regime, and discretize it to run it like a ...
lostintimespace's user avatar
0 votes
0 answers
97 views

I have received the following problem: Concider the following simple model of a neuron z = wx + b logits, yˆ = g(z) activation, L2 (w, b) = 12 (y − ŷ)^2 quadratic loss (Mean Squared Error (MSE), L2 ...
Erbas's user avatar
  • 1
2 votes
0 answers
137 views

I am teaching myself Artificial Intelligence from scratch without libraries I have a decent handle on most of it UPDATE-EDIT I am lost however on the next step mathematically after deriving the ...
The Thinkrium's user avatar
0 votes
1 answer
915 views

I am working on the backpropation through fully-connected layers, suppose this architecture: My ultimate goal is to find the gradient of $\overrightarrow{a}$ respect to the loss function $C$, given ...
Fed_Dragon's user avatar
2 votes
2 answers
241 views

I have an issue with the following problem. I am trying to derive the gradients with respect to $x_t, h_{t-1}, W_x, W_h$. $x_t$ is a $N*D$ vector. $h_t$ is a $N*H$ vector. $W_h$ is a $H*H$ matrix. $...
Samuel Lee's user avatar
0 votes
1 answer
125 views

I'm trying to understand the maths behind backpropagation using this book. I have looked at the formulae the backprop algorithm uses and have worked through their proofs; however, I was wondering ...
Cipollino's user avatar

15 30 50 per page