Questions tagged [missing-data]

Question 1

Problem description I have a dataset which is a combination of multiple sources gathering the same kind of data. I have retrieved those data to fit them into several columns of a pandas dataframe. All ...

Question 2

Years ago, I read in the paper that they proposed a K-means-based approach to impute missing values over energy time data. At the point in time, since I did not have access to that data, I tried to ...

Question 3

Download PDF files frome this website "https://register.awmf.org/de/start" but the code didn't find any PDF Link, although there are links to PDF files, but indirectly,I want to download all ...

Question 4

I'm working on a project using an E-commerce dataset. I'm facing an issue in the data cleaning stage. I have the customers dataset, which has approximately. 1.6 million rows. One of the feature, "...

Question 5

When dealing with missing data in categorical variables, common approaches include imputation by mode or predictive models. However, in some cases, certain categories have extremely low frequency or ...

Question 6

I have a full dataset and introduce some missingness by one of these type (MCAR,MAR,MNAR) then split data in to train and test dataset after that I impute missing values by using different ML ...

Question 7

I've been trying to learn data science for a while now. In fact, I actually finished the "Data Scientist Associate" career path in DataCamp. However, as you might expect, the courses don't ...

Question 8

I have almost 20 features. among them some are categorical and some are numerical. I already convert those categorical features into binary encoding. The problem is that among 20 features, two feature ...

Question 9

I am currently working on a predictive modeling project using the gbm package in R and have encountered a challenge regarding missing values in one of my predictor variables. I would appreciate your ...

Question 10

I am using Random Forest as my machine learning algorithm. Before using the model, I had to clean up my data and I already removed missing values but when I try to use my model it keep saying : y ...

Question 11

I have a dataset of say 1 million observations. As a silly example, say we want to predict if a person can become a data scientist or not (0/1). I have variables that have a lot of missing values but ...

Question 12

I have a set of entity types, say colors (red, green, blue, etc.) and a set of groups of entities. E.g. one group may be 3 blue, one may be 2 red and 1 blue, and so on. I have the assumption that ...

Question 13

as the title suggest i m working on a dataset and there are about 60% missing values of a certain column ,should i simply drop the column instead of imputing it ,the reason behind it is ,i am working ...

Question 14

I have a project where I am predicting the best schools based on a series of tests scores, teacher attendance rates, etc. I would like to predict the best school to go to. Some of the data is of ...

Question 15

Imagine we have fitted a sklearn.tree.DecisionTreeClassifier object like this one: If we wanted to predict the class of this observation: ...

Stack Exchange Network

Questions tagged [missing-data]

Clustering from multi-sources with missing data

What is the best practice to impute missing data with patterns over the time? (potential of K-means clustering for imputation of missing values!?)

I wrote a code in R language to download PDF files from a website automatically, but the code didn't find the PDF file links, although there are links

How do i fill the Null values of a categorical column?

How do outliers affect the process of imputing missing data in categorical variables?

How to compare between different ML models for imputation ,If I split data in to train and test dataset?

Data Science/Analysis Book that Covers Missing Data

Handling columns data when 100% missing but need to retrieve those values

Handling Missing Values in Predictor Variables for Gradient Boosting Models ( gbm() ) in R

Missing Data keeps popping up

Filling a lot of missing values with arbitrary value

What is this type of problem is this?

dropping a column with more than 60% missing values

Should Imputation Models be Cross Validated

How do sklearn's trees evaluate NaNs on inference?

Hot Network Questions