Multivariate Regression Error “AttributeError: 'numpy.ndarray' object has no attribute 'columns'”

Question

I'm trying to run a multivariate linear regression but I'm getting an error when trying to get the coefficients of the regression model.

The error I'm getting is this: AttributeError: 'numpy.ndarray' object has no attribute 'columns'

Here's the code I'm using:

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as seabornInstance from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn import metrics %matplotlib inline # Main files dataset = pd.read_csv('namaste_econ_model.csv') dataset.shape dataset.describe() dataset.isnull().any() #Dividing data into "attributes" and "labels". X variable contains all the attributes and y variable contains labels. X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']].values y = dataset['Change in Profit (BP)'].values plt.figure(figsize=(15,10)) plt.tight_layout() seabornInstance.distplot(dataset['Change in Profit (BP)']) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) regressor = LinearRegression() regressor.fit(X_train, y_train) coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient']) coeff_df

Full error:

Traceback (most recent call last):

File "", line 14, in coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

Any help on this will be highly appreciated!

Simon Larsson · Accepted Answer · 2019-09-25 19:22:45Z

Using .values on a pandas dataframe gives you a numpy array. This will not contain column names and such. You do this when setting X like this:

X = dataset[['Read?', 'x1', .. ,'x47']].values

But then you try to get the column names from X (which it does not have) by writing X.columns here:

coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])

So store your column names in a variable or input them again, like this:

coeff_df = pd.DataFrame(regressor.coef_, ['Read?', 'x1', .. ,'x47'], columns=['Coefficient'])

Hi Simon, thank you so much for the support. You're right... that was totally the issue. Now I do have another question, sorry if I'm being opportunistic here... so the x1,x2, etc... are actually dummy variables (ref. prnt.sc/payj7q) that came from a categorical variable. So now my output for the coefficients, looks like this... prnt.sc/payio0 I'm doing something wrong here or where should I look to improve the model? Again, thank you so much for your support. — Eduardo Martinez
– Eduardo Martinez, Commented Sep 25, 2019 at 19:29

Peter · Accepted Answer · 2019-10-15 16:50:33Z

hi remove values method

X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']] coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])

Stack Exchange Network

Multivariate Regression Error “AttributeError: 'numpy.ndarray' object has no attribute 'columns'”

2 Answers 2

Hot Network Questions

Multivariate Regression Error “AttributeError: 'numpy.ndarray' object has no attribute 'columns'”

2 Answers 2

Related

Hot Network Questions