Questions tagged [class-imbalance]
Questions referring to classifiers or classifying problems where some of the classes in the data are under-represented.
606 questions
6 votes
1 answer
58 views
When should we avoid balancing an imbalanced dataset?
I am working on a network security-related project, in which I have to build a deep learning model to detect a specific attack. It's about detecting whether a network system of an organisation is a ...
2 votes
1 answer
63 views
Imbalanced classes and ML set up
I’m working on a MarTech use case (predict customers conversions to a certain product). Not really used to work within this domain, therefore I’m seeking some critical questions on my set up. Context: ...
3 votes
0 answers
76 views
Balancing dataset question
i am working on my bachelor thesis, the name of the topic is Diabetes prediction using machine learning. Dataset i am working on is from Kaggle and it's called Pima Indians Diabetes. Since my dataset ...
34 votes
3 answers
4k views
Is class imbalance really a problem in machine learning?
Following on from my recent post on the topic, my goal here is to synthesise the excellent community wisdom on it over at Cross Validated into a "canonical" Q&A for the data science SE :)...
3 votes
1 answer
71 views
What loss functions are suitable for a YOLO-like architecture in TensorFlow/Keras, especially for fine-tuning on an imbalanced dataset?
I'm working with a custom YOLO-like architecture implemented in TensorFlow/Keras. While pretraining on the COCO dataset works, I plan to fine-tune the model on a highly imbalanced dataset. ...
0 votes
0 answers
42 views
churn prediction machine learning low precision
i am working on a project to check for churn prediction, but my data is very imbalanced I tried so many things but this the best model I can get to my main problem is that I want recall and Precision ...
0 votes
0 answers
50 views
Why do these undersampling methods return such different results?
I have a table in a database; let's call it TABLE1. It contains several columns: One for a unique customer ID Several feature columns One for the class I want to predict There are ~280k rows where ...
6 votes
1 answer
82 views
Should class-weights take validation-set into account?
I need to calculate class-weights to train my deep learning model. In order to simulate real-world producing scenario as possible as I can, I have excluded the testing/infering dataset from which ...
0 votes
0 answers
23 views
How to Properly Use scale_pos_weight in an XGBoost MultiOutput Classifier to Address Severe Class Imbalance?
I'm working on predicting two genetic mutations simultaneously using an XGBoost Multioutput Classifier. My dataset is severely imbalanced, particularly for cases where both genetic mutations are ...
5 votes
2 answers
149 views
Why do we need Smote?
We use Smote to balance the imbalanced dataset but why we are manipulating things and cannot use the natural data i mean what is the need for balancing what exact impact it will make to model
0 votes
0 answers
43 views
Question on Optimized Threshold in Predictive Modeling
I'm trying to build a predictive model, but I haven't found a method that consistently delivers high performance. Is it acceptable to use an # Optimize classification threshold 0.996 ?
1 vote
2 answers
68 views
What do these train and test accuracy and loss graphs suggest ? Can train and test accuracy reach 80% after one epoch?
This is the accuracy and loss plot for CNN model. Is it possible that train and test accuracy may starts from 80% from the 1st epoch itself for 5 k fold.
0 votes
0 answers
53 views
how to properly implement Random Undersampling during Cross-Validation in Orange
I am working on a highly imbalanced fraud detection dataset (class 0:284315 instances, class 1: 492 instances) and trying to implement random undersampling correctly during cross-validation in Orange. ...
2 votes
3 answers
504 views
ROC vs PR-score and imbalanced datasets
I can see everywhere that when the dataset is imbalanced PR-AUC is a better performance indicator than ROC. From my experience, if the positive class is the most important, and there is higher ...
4 votes
1 answer
94 views
Taking into account instance cost in learning?
I am generally trying to take into account costs in learning. The set-up is as follows: a statistical learning problem with usuall X and y, where y is imbalanced (roughly 1% of ones). Scikit learn ...