Skip to main content

Questions tagged [xgboost]

For questions related to the eXtreme Gradient Boosting algorithm.

4 votes
0 answers
68 views

XGBoost has a history of being dominant in Kaggle competitions, but why is it so competitive on structured data compared to other ML algorithms? I’ve read that sparse matrix utilization improves the ...
Mr. AI Cool's user avatar
3 votes
0 answers
42 views

Currently in mlflow.xgboost if I start autolog(), then all possible parameters are logged, even if they are not set. I'd like to disable them to see in mlflow GUI only the ones that were set. It seems,...
lmocsi's user avatar
  • 131
5 votes
1 answer
76 views

I'm using an early stopping for XGBClassifier. The fitting looks like this (simplified): ...
Jakub Małecki's user avatar
5 votes
1 answer
235 views

What is the pros and cons of using XGBoost VS GBR (scikit-learn) when dealing with data 500<records<1000 and about 5 columns?
Ocean's user avatar
  • 427
6 votes
2 answers
741 views

I am training a machine learning model to predict a score based on some behavioral client data. The model would be something classic like a random forest, XGBoost or multilinear Regression. Depending ...
JouJour's user avatar
  • 101
2 votes
0 answers
86 views

I'm trying to create a predictive model for a dataset with continuous input variables and a binary/probability output. The input are sensors (up to 400 columns, but some very irrelevant) which are ...
user46124's user avatar
2 votes
0 answers
144 views

I have a logistic regression model, the output of which is used to make decisions. I am testing an improved version of this model. In testing, it has substantially improved logloss vs old model. When ...
user179361's user avatar
0 votes
0 answers
23 views

I'm working on predicting two genetic mutations simultaneously using an XGBoost Multioutput Classifier. My dataset is severely imbalanced, particularly for cases where both genetic mutations are ...
Marta's user avatar
  • 1
2 votes
1 answer
83 views

I already have a GLM model in place to predict claims frequency. I know have access to many new variables (a mix of categorical and continuous variables, some of which are likely correlated). I wish ...
InsurancePricer's user avatar
5 votes
1 answer
152 views

Im working on a regression problem with 400 samples and 7 features, to predict job durations of machineries from historical data. Im using XGboost and (90,10) split works better than (80,20) split. Is ...
barcamela's user avatar
9 votes
1 answer
298 views

Traditionally ML algorithms for ranking take the features as input and then output a "ranking score" which do not have a natural probabilistic interpretation. For example, suppose we have ...
Ishigami's user avatar
  • 193
1 vote
0 answers
46 views

I have the following dataframe (in wide format) which records the IQ, Hours (number of hours of studying) and ...
Ishigami's user avatar
  • 193
3 votes
2 answers
2k views

I'm getting this error when trying to load a saved XGBRegressor model locally: ...
Jack's user avatar
  • 31
1 vote
0 answers
27 views

Consider a setting in which I have an unbalanced dataset where the targeted class takes values = 1 in 0,01% of observations and value = 0 in 99,9% of the observations. I train a classification model, ...
Ale's user avatar
  • 161
5 votes
1 answer
129 views

Consider a dataset and two binary classes CLASS_A and CLASS_B. These two classes are not necessarely independent. Let's say that CLASS_A = "buy an apple" and CLASS_B = "buy an orange&...
Ale's user avatar
  • 161

15 30 50 per page
1
2 3 4 5
47