Skip to main content

Questions tagged [missing-data]

Missing data is a problem that arises in data science when some data contained in rows or columns may be missing or unavailable for some samples in a dataset.

2 votes
0 answers
64 views

Problem description I have a dataset which is a combination of multiple sources gathering the same kind of data. I have retrieved those data to fit them into several columns of a pandas dataframe. All ...
patacoing's user avatar
0 votes
0 answers
58 views

Years ago, I read in the paper that they proposed a K-means-based approach to impute missing values over energy time data. At the point in time, since I did not have access to that data, I tried to ...
Mario's user avatar
  • 610
5 votes
1 answer
109 views

Download PDF files frome this website "https://register.awmf.org/de/start" but the code didn't find any PDF Link, although there are links to PDF files, but indirectly,I want to download all ...
Ward Khedr's user avatar
3 votes
2 answers
114 views

I'm working on a project using an E-commerce dataset. I'm facing an issue in the data cleaning stage. I have the customers dataset, which has approximately. 1.6 million rows. One of the feature, "...
Mohd Yasser's user avatar
4 votes
1 answer
71 views

When dealing with missing data in categorical variables, common approaches include imputation by mode or predictive models. However, in some cases, certain categories have extremely low frequency or ...
Celine Yvone's user avatar
0 votes
0 answers
36 views

I have a full dataset and introduce some missingness by one of these type (MCAR,MAR,MNAR) then split data in to train and test dataset after that I impute missing values by using different ML ...
zhyan's user avatar
  • 101
2 votes
1 answer
183 views

I've been trying to learn data science for a while now. In fact, I actually finished the "Data Scientist Associate" career path in DataCamp. However, as you might expect, the courses don't ...
astrobiologist's user avatar
0 votes
0 answers
30 views

I have almost 20 features. among them some are categorical and some are numerical. I already convert those categorical features into binary encoding. The problem is that among 20 features, two feature ...
Bidur Tiwari's user avatar
2 votes
0 answers
51 views

I am currently working on a predictive modeling project using the gbm package in R and have encountered a challenge regarding missing values in one of my predictor variables. I would appreciate your ...
Anso's user avatar
  • 21
-1 votes
2 answers
44 views

I am using Random Forest as my machine learning algorithm. Before using the model, I had to clean up my data and I already removed missing values but when I try to use my model it keep saying : y ...
Akingba Gladys's user avatar
0 votes
1 answer
63 views

I have a dataset of say 1 million observations. As a silly example, say we want to predict if a person can become a data scientist or not (0/1). I have variables that have a lot of missing values but ...
Kilkik's user avatar
  • 101
1 vote
1 answer
64 views

I have a set of entity types, say colors (red, green, blue, etc.) and a set of groups of entities. E.g. one group may be 3 blue, one may be 2 red and 1 blue, and so on. I have the assumption that ...
Steve's user avatar
  • 11
0 votes
1 answer
301 views

as the title suggest i m working on a dataset and there are about 60% missing values of a certain column ,should i simply drop the column instead of imputing it ,the reason behind it is ,i am working ...
Sofia Malik's user avatar
2 votes
0 answers
38 views

I have a project where I am predicting the best schools based on a series of tests scores, teacher attendance rates, etc. I would like to predict the best school to go to. Some of the data is of ...
Englishman Bob's user avatar
2 votes
2 answers
158 views

Imagine we have fitted a sklearn.tree.DecisionTreeClassifier object like this one: If we wanted to predict the class of this observation: ...
Tendero's user avatar
  • 265

15 30 50 per page
1
2 3 4 5
15