Questions tagged [missing-data]
Missing data is a problem that arises in data science when some data contained in rows or columns may be missing or unavailable for some samples in a dataset.
215 questions
2 votes
0 answers
64 views
Clustering from multi-sources with missing data
Problem description I have a dataset which is a combination of multiple sources gathering the same kind of data. I have retrieved those data to fit them into several columns of a pandas dataframe. All ...
0 votes
0 answers
58 views
What is the best practice to impute missing data with patterns over the time? (potential of K-means clustering for imputation of missing values!?)
Years ago, I read in the paper that they proposed a K-means-based approach to impute missing values over energy time data. At the point in time, since I did not have access to that data, I tried to ...
5 votes
1 answer
109 views
I wrote a code in R language to download PDF files from a website automatically, but the code didn't find the PDF file links, although there are links
Download PDF files frome this website "https://register.awmf.org/de/start" but the code didn't find any PDF Link, although there are links to PDF files, but indirectly,I want to download all ...
3 votes
2 answers
114 views
How do i fill the Null values of a categorical column?
I'm working on a project using an E-commerce dataset. I'm facing an issue in the data cleaning stage. I have the customers dataset, which has approximately. 1.6 million rows. One of the feature, "...
4 votes
1 answer
71 views
How do outliers affect the process of imputing missing data in categorical variables?
When dealing with missing data in categorical variables, common approaches include imputation by mode or predictive models. However, in some cases, certain categories have extremely low frequency or ...
0 votes
0 answers
36 views
How to compare between different ML models for imputation ,If I split data in to train and test dataset?
I have a full dataset and introduce some missingness by one of these type (MCAR,MAR,MNAR) then split data in to train and test dataset after that I impute missing values by using different ML ...
2 votes
1 answer
183 views
Data Science/Analysis Book that Covers Missing Data
I've been trying to learn data science for a while now. In fact, I actually finished the "Data Scientist Associate" career path in DataCamp. However, as you might expect, the courses don't ...
0 votes
0 answers
30 views
Handling columns data when 100% missing but need to retrieve those values
I have almost 20 features. among them some are categorical and some are numerical. I already convert those categorical features into binary encoding. The problem is that among 20 features, two feature ...
2 votes
0 answers
51 views
Handling Missing Values in Predictor Variables for Gradient Boosting Models ( gbm() ) in R
I am currently working on a predictive modeling project using the gbm package in R and have encountered a challenge regarding missing values in one of my predictor variables. I would appreciate your ...
-1 votes
2 answers
44 views
Missing Data keeps popping up
I am using Random Forest as my machine learning algorithm. Before using the model, I had to clean up my data and I already removed missing values but when I try to use my model it keep saying : y ...
0 votes
1 answer
63 views
Filling a lot of missing values with arbitrary value
I have a dataset of say 1 million observations. As a silly example, say we want to predict if a person can become a data scientist or not (0/1). I have variables that have a lot of missing values but ...
1 vote
1 answer
64 views
What is this type of problem is this?
I have a set of entity types, say colors (red, green, blue, etc.) and a set of groups of entities. E.g. one group may be 3 blue, one may be 2 red and 1 blue, and so on. I have the assumption that ...
0 votes
1 answer
301 views
dropping a column with more than 60% missing values
as the title suggest i m working on a dataset and there are about 60% missing values of a certain column ,should i simply drop the column instead of imputing it ,the reason behind it is ,i am working ...
2 votes
0 answers
38 views
Should Imputation Models be Cross Validated
I have a project where I am predicting the best schools based on a series of tests scores, teacher attendance rates, etc. I would like to predict the best school to go to. Some of the data is of ...
2 votes
2 answers
158 views
How do sklearn's trees evaluate NaNs on inference?
Imagine we have fitted a sklearn.tree.DecisionTreeClassifier object like this one: If we wanted to predict the class of this observation: ...