Questions tagged [loss-function]
A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.
517 questions
3 votes
1 answer
71 views
What loss functions are suitable for a YOLO-like architecture in TensorFlow/Keras, especially for fine-tuning on an imbalanced dataset?
I'm working with a custom YOLO-like architecture implemented in TensorFlow/Keras. While pretraining on the COCO dataset works, I plan to fine-tune the model on a highly imbalanced dataset. ...
3 votes
1 answer
78 views
How do you differentiate population count/Hamming weight?
I've come across a loss regularizing function that uses population counts (i.e., bits that are one, Hamming weight) of activations: $$ L_\mathrm{reg} = H(\max(\lfloor x \rceil, 0)), $$ where $x$ is an ...
2 votes
2 answers
105 views
Best Practice for Group Based splitting (Train / Val / Test)
As an intro, Group Based Splitting is data splitting into Train / Test (Val), when by some attribute like patient_id, item_id or similar, to ensure that same person ...
5 votes
1 answer
109 views
I wrote a code in R language to download PDF files from a website automatically, but the code didn't find the PDF file links, although there are links
Download PDF files frome this website "https://register.awmf.org/de/start" but the code didn't find any PDF Link, although there are links to PDF files, but indirectly,I want to download all ...
0 votes
0 answers
34 views
Custom loss function not behaving as expected in PyTorch but does in TensorFlow
I tried modifying the reconstruction loss such that values that are pushed out of bounds do not contribute to the loss and it works as expected in tensorflow after training an autoencoder. However, ...
1 vote
0 answers
51 views
Using a differentiable Self-Organizing Map loss in a CNN
I've been trying to aggregate a normal CNN loss with a loss that quantifies how well we can cluster the second-to-last layer embeddings (i.e. feed the embeddings to a 2D Self Organizing Map (SOM) and ...
6 votes
2 answers
83 views
Does it make sense to mix the labels in each batch?
For a binary classification model, When training a deep model, at each training step, the model receives a batch (i.e batch of size 32 samples). Let's assume that in each training batch there are ...
0 votes
0 answers
65 views
Logistic Regression Loss can be zero? Question from a Test
I have a question from a test, I managed to solve it, but something feels weird... Prove it is false: If all the samples for Logistic Regression are categorized false, so the training loss is 0. What ...
3 votes
1 answer
129 views
How to incorporate weights (probability measurements) of data into a mean squared error loss function
I am training a CNN to regress on 4 targets related to a given image. Within the image is a point of interest whose position can be defined by phi, and theta (corresponding to x and y of a normal ...
0 votes
0 answers
21 views
Numerical precision in Flux.jl
I am trying to study ANN training in terms of dynamical systems framework, by treating the model as the system, and the training as the time evolution dynamics. As an extension, I tried to make the ...
5 votes
2 answers
688 views
Is there any advantage of a lower value of a loss function?
I have two loss functions $\mathcal{L}_1$ and $\mathcal{L}_2$ to train my model. The model is predominantly a classification model. Both $\mathcal{L}_1$ and $\mathcal{L}_2$ takes are two variants of ...
4 votes
1 answer
94 views
Taking into account instance cost in learning?
I am generally trying to take into account costs in learning. The set-up is as follows: a statistical learning problem with usuall X and y, where y is imbalanced (roughly 1% of ones). Scikit learn ...
1 vote
0 answers
56 views
Per Channel loss or Per Sample Loss
I am currently tackling a semantic segmentation problem where, for each sample, my goal is to segment two masks corresponding to two objects. Notably, object two is typically located inside object one,...
3 votes
1 answer
94 views
Why softmax training is more stable
I'm wondering about which activation function will be easier to train with (get better accuracy / smallest loss) - with SoftMax or sigmoid (for multiclass classification problem) According to: https://...
0 votes
1 answer
110 views
What exactly is a true distribution in ML problems?
I define a classification problem as a problem of calculating a function $h$ that approximates a function $f$ that classifies data. The approximation is calculated by taking a set of training samples ...