Extract seperated value from columns

Question

I am working on this CSV file which is a collection of movie details from IMDB. In this I have a genres column in the dataframe with all the genres of the movies seperated by a pipe (|)
What I need is to extract the first two genres from the genres column and store them in two new columns: genre_1 and genre_2.
And for the columns where there is only 1 genre, extract the single genre into both the columns, i.e. for such movies the genre_2 will be the same as genre_1.

I am sharing the screen shots of the code and results that I have got.

Now, I can create a new data frame with the genres created and can then remove the unwanted columns and can concatenate the remaining the with original data frame. But that looks pretty clumsy.
How can I crate split the column in my original data frame only and remove the unwanted expanded columns.
Any help is appreciated.

Catalina Chircu · Accepted Answer · 2020-05-16 08:45:45Z

This is a programming question rather than a data science question.

You need to use apply with a lambda function. So if your DataFrame is called movies:

In apply you must add axis=1 which means that you apply the function to rows and not to columns.

def get_genre(row, genre_index): array_genres = row['genres'].split('|') if len(array_genres) == 0: return '' elif len(array_genres) == 1 and genre_index == 1: return array_genres[0] else: return array_genres[genre_index] movies['genre_1'] = movies.apply(lambda row: get_genre(row, 0), axis=1) movies['genre_2'] = movies.apply(lambda row: get_genre(row, 1), axis=1)

score 0 · Accepted Answer · 2020-10-14 15:09:29Z

Try:

# Create an example dataframe df = pd.DataFrame({"genres":["Fantasy|Sci-Fi", "Action|Adventure|Fantasy", "Thriller", "Action|Adventure|Thriller|bbv","Action","Action|Adventure|thriller"]}) # Get a dataframe with as many columns as there are genres df = df.genres.str.get_dummies(sep = "|") # Get the genres as values df = df.multiply(df.columns) # Rename the columns to have the genre id df.columns = ["genre_" + str(x) for x in range(len(df.columns))]

Input:

Output:

EDIT:

you can simply use pandas assign method:

df.assign(genre1= df.genres.str.split("|", expand = True).iloc[:,:1], genre2 = df.genres.str.split("|", expand = True).iloc[:,1:2])

*Output:

Stack Exchange Network

Extract seperated value from columns

2 Answers 2

Hot Network Questions

Extract seperated value from columns

2 Answers 2

Related

Hot Network Questions