Questions tagged [xgboost]
For questions related to the eXtreme Gradient Boosting algorithm.
702 questions
4 votes
0 answers
68 views
What makes XGBoost so much more dominant with structured data?
XGBoost has a history of being dominant in Kaggle competitions, but why is it so competitive on structured data compared to other ML algorithms? I’ve read that sparse matrix utilization improves the ...
3 votes
0 answers
42 views
How can I disable in mlflow.xgboost.autolog to log "none" parameters?
Currently in mlflow.xgboost if I start autolog(), then all possible parameters are logged, even if they are not set. I'd like to disable them to see in mlflow GUI only the ones that were set. It seems,...
5 votes
1 answer
76 views
Does using test data in eval_set argument for xgboost cause data leakage?
I'm using an early stopping for XGBClassifier. The fitting looks like this (simplified): ...
5 votes
1 answer
235 views
XGBoost or GBR?
What is the pros and cons of using XGBoost VS GBR (scikit-learn) when dealing with data 500<records<1000 and about 5 columns?
6 votes
2 answers
741 views
What are the typical GPU requirements for training a classic predictions Model like XGBoost or Random Forest?
I am training a machine learning model to predict a score based on some behavioral client data. The model would be something classic like a random forest, XGBoost or multilinear Regression. Depending ...
2 votes
0 answers
86 views
How do I train a model on data where there should be a statistical difference but it can't find it?
I'm trying to create a predictive model for a dataset with continuous input variables and a binary/probability output. The input are sensors (up to 400 columns, but some very irrelevant) which are ...
2 votes
0 answers
144 views
Evaluating model performance when used in targeting decisions
I have a logistic regression model, the output of which is used to make decisions. I am testing an improved version of this model. In testing, it has substantially improved logloss vs old model. When ...
0 votes
0 answers
23 views
How to Properly Use scale_pos_weight in an XGBoost MultiOutput Classifier to Address Severe Class Imbalance?
I'm working on predicting two genetic mutations simultaneously using an XGBoost Multioutput Classifier. My dataset is severely imbalanced, particularly for cases where both genetic mutations are ...
2 votes
1 answer
83 views
New Variables to Add to Model GLM/GBM
I already have a GLM model in place to predict claims frequency. I know have access to many new variables (a mix of categorical and continuous variables, some of which are likely correlated). I wish ...
5 votes
1 answer
152 views
400 instances dataset XGboost, is my model overfitting?
Im working on a regression problem with 400 samples and 7 features, to predict job durations of machineries from historical data. Im using XGboost and (90,10) split works better than (80,20) split. Is ...
9 votes
1 answer
298 views
Machine learning model for ranking that outputs probabilities
Traditionally ML algorithms for ranking take the features as input and then output a "ranking score" which do not have a natural probabilistic interpretation. For example, suppose we have ...
1 vote
0 answers
46 views
Learning models for non-mutually exclusive events/ labels other than Multilabel classification
I have the following dataframe (in wide format) which records the IQ, Hours (number of hours of studying) and ...
3 votes
2 answers
2k views
XGBoost __sklearn_tags__ Method Error in Python When Loading Model [closed]
I'm getting this error when trying to load a saved XGBRegressor model locally: ...
1 vote
0 answers
27 views
Interpreting predicted probabilities after rebalancing [duplicate]
Consider a setting in which I have an unbalanced dataset where the targeted class takes values = 1 in 0,01% of observations and value = 0 in 99,9% of the observations. I train a classification model, ...
5 votes
1 answer
129 views
Comparing probabilities of two models
Consider a dataset and two binary classes CLASS_A and CLASS_B. These two classes are not necessarely independent. Let's say that CLASS_A = "buy an apple" and CLASS_B = "buy an orange&...