Skip to main content

Questions tagged [batch-normalization]

For questions about Batch Normalization of layer activations in theory and practice, as used in (typically deep) neural networks.

3 votes
0 answers
37 views

We already have (spatial) batch norm and (spatial) layer norm: Why don't we normalize over everything so that each entire activation plane over all batches over all channels gets the benefits of both ...
Chris's user avatar
  • 31
0 votes
0 answers
36 views

I do see a lot of answers on the same topic. However, I have a part that still confuses me so I ask this question. https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-...
jho317's user avatar
  • 1
4 votes
1 answer
77 views

I have reaction time as a dependent variable and age as an independent variable. I want to do a linear mixed model analysis. My data is not normally distributed. Should I have to transform data? I ...
Monika Thakur's user avatar
7 votes
1 answer
1k views

In Let's build GPT: from scratch, in code, spelled out., Andrej Karpathy says that no one likes Batch Normalization layer and people want to remove it. He also said it brings so many bugs and he shot ...
mon's user avatar
  • 829
0 votes
2 answers
1k views

In Batch Normalization, mean and standard deviation are calculated feature wise and normalization step is done instance wise and in Layer Normalization mean and standard deviation are calculated ...
April's user avatar
  • 1
0 votes
2 answers
296 views

Experimenting with the cifar10 dataset and faced with strange behavior when Dropout and BatchNorm don't help at all. As I get: Dropout - freezing some of the weights which helps us to prevent ...
kirsanv43's user avatar
  • 103
1 vote
2 answers
478 views

According to a machine level mastery post on batch norm For small mini-batch sizes or mini-batches that do not contain a representative distribution of examples from the training dataset, the ...
RAbraham's user avatar
  • 197
1 vote
1 answer
90 views

Question 22 of 100+ Data Science Interview Questions and Answers for 2022 asks What is the benefit of batch normalization? The first bullet of the answers to this is The model is less sensitive to ...
Galen's user avatar
  • 192
0 votes
1 answer
118 views

I'm using Tensorflow. Consider the example below: ...
worduser's user avatar
0 votes
1 answer
665 views

I read the article Batch normalization: theory and how to use it with Tensorflow by Federico Peccia. The batch normalized activation is $$ \bar x_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} $$...
Triceratops's user avatar
2 votes
1 answer
753 views

I recently found this twitter thread from Andrej Karpathy. In it he states a few common mistakes during the development of a neural network. you didn't try to overfit a single batch first. you forgot ...
KDecker's user avatar
  • 123
2 votes
1 answer
2k views

I'm using a ResNet50 model pretrained on ImageNet, to do transfer learning, fitting an image classification task. The easy way of doing this is simply freezing the conv layers (or really all layers ...
amateurjustin's user avatar
0 votes
1 answer
108 views

I am working on a task of generating synthetic data to help the training of my model. This means that the training is performed on synthetic + real data, and tested on real data. I was told that batch ...
Manveru's user avatar
  • 111
1 vote
1 answer
172 views

I am a bit confused about the relation between terms "Dropout" and "BatchNorm". As I understand, Dropout is regularization technique, which is using only during training. ...
AlexM's user avatar
  • 23
3 votes
1 answer
8k views

According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid ...
user3668129's user avatar

15 30 50 per page
1
2 3 4 5