Questions tagged [batch-normalization]
For questions about Batch Normalization of layer activations in theory and practice, as used in (typically deep) neural networks.
62 questions
3 votes
0 answers
37 views
Why is there no spatial batch layer norm?
We already have (spatial) batch norm and (spatial) layer norm: Why don't we normalize over everything so that each entire activation plane over all batches over all channels gets the benefits of both ...
0 votes
0 answers
36 views
Inference time Batch Normalization's Unbiased Variance Estimate in Convolutional Neural Network
I do see a lot of answers on the same topic. However, I have a part that still confuses me so I ask this question. https://stackoverflow.com/questions/38553927/batch-normalization-in-convolutional-...
4 votes
1 answer
77 views
Help on data transformation
I have reaction time as a dependent variable and age as an independent variable. I want to do a linear mixed model analysis. My data is not normally distributed. Should I have to transform data? I ...
7 votes
1 answer
1k views
Why Batch Normalization is undesirable?
In Let's build GPT: from scratch, in code, spelled out., Andrej Karpathy says that no one likes Batch Normalization layer and people want to remove it. He also said it brings so many bugs and he shot ...
0 votes
2 answers
1k views
Batch Normalization vs Layer Normalization
In Batch Normalization, mean and standard deviation are calculated feature wise and normalization step is done instance wise and in Layer Normalization mean and standard deviation are calculated ...
0 votes
2 answers
296 views
Dropout and BatchNorm decrease speed of learning
Experimenting with the cifar10 dataset and faced with strange behavior when Dropout and BatchNorm don't help at all. As I get: Dropout - freezing some of the weights which helps us to prevent ...
1 vote
2 answers
478 views
Should we always use Batch Renormalization over Batch Normalization?
According to a machine level mastery post on batch norm For small mini-batch sizes or mini-batches that do not contain a representative distribution of examples from the training dataset, the ...
1 vote
1 answer
90 views
How does batch normalization make a model less sensitive to hyperparameter tuning?
Question 22 of 100+ Data Science Interview Questions and Answers for 2022 asks What is the benefit of batch normalization? The first bullet of the answers to this is The model is less sensitive to ...
0 votes
1 answer
118 views
Why doesn't batch normalization 'zero out' a batch of size one?
I'm using Tensorflow. Consider the example below: ...
0 votes
1 answer
665 views
Equations in "Batch normalization: theory and how to use it with Tensorflow"
I read the article Batch normalization: theory and how to use it with Tensorflow by Federico Peccia. The batch normalized activation is $$ \bar x_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} $$...
2 votes
1 answer
753 views
Explanation of Karpathy tweet about common mistakes. #5: "you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm"
I recently found this twitter thread from Andrej Karpathy. In it he states a few common mistakes during the development of a neural network. you didn't try to overfit a single batch first. you forgot ...
2 votes
1 answer
2k views
To freeze or not, batch normalisation in ResNet when transfer learning
I'm using a ResNet50 model pretrained on ImageNet, to do transfer learning, fitting an image classification task. The easy way of doing this is simply freezing the conv layers (or really all layers ...
0 votes
1 answer
108 views
Batch normalization for multiple datasets?
I am working on a task of generating synthetic data to help the training of my model. This means that the training is performed on synthetic + real data, and tested on real data. I was told that batch ...
1 vote
1 answer
172 views
Using batchnorm and dropout simultaneously?
I am a bit confused about the relation between terms "Dropout" and "BatchNorm". As I understand, Dropout is regularization technique, which is using only during training. ...
3 votes
1 answer
8k views
How batch normalization layer resolve the vanishing gradient problem?
According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid ...