I am using Random Forest as my machine learning algorithm. Before using the model, I had to clean up my data and I already removed missing values but when I try to use my model it keep saying : y contains missing value, bringing up an error message and when I check for missing values I get (0) meaning I don't have any missing value. What can be done?
2 Answers
Check if there are any non-standard representations of missing values in y, such as empty strings or specific placeholder values. Use the unique() function to see all unique values. If y is a DataFrame or a Series, use the following code to check unique values: print(y.unique()). If it is a column in the dataframe named df, try the following code: df["y"].unique()
It's time to make the code confess by testing it.
Below code checks y for NaN import numpy as np y = y.astype(float) if np.isnan(y).any(): print("NaN values found in y.")
Look for invisible NaN if y.isnull().any() or (y == "").any(): print("Missing or empty values found in y")
Make sure your y does not have strings and convert these to numerical form from sklearn.preprocessing import LabelEncoder le = LabelEncoder() y = le.fit_transform(y)
Then last one, try to reset the indexes if the above does not help X = X.reset_index(drop=True) y = y.reset_index(drop=True)
The second option always works for me. I hope this helps.