Multiple filtering pandas columns based on values in another column

Question

I have a pandas dataframe df1:

Now, I want to filter the rows in df1 based on unique combinations of (Campaign, Merchant) from another dataframe, df2, which look like this:

What I tried is using .isin, with a code similar to the one below:

df1.loc[df1['Campaign'].isin(df2['Campaign']) & df1['Merchant'].isin(df2['Merchant'])]

The problem here is that the conditions are independent eg : I want to check if (A,1) from df2 is in df1, but with the above condition, since I am checking all the list, not row by row, it would return all rows in df1 where Campaign column is A OR Merchant column is 1.

Do you have any suggestion for this multiple pandas filtering?

user91338 · Accepted Answer · 2020-03-06 11:12:31Z

Bit late but my preferred solution to this is

# verbetim from @tuomastik import pandas as pd df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6), "Campaign": ["A"] * 5 + ["B"], "Merchant": [1, 1, 1, 2, 3, 1]}) df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6), "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2, "Merchant": [1, 2, 1, 2, 1, 2]}) # modification def pair_columns(df, col1, col2): return df[col1] + df[col2] def paired_mask(df1, df2, col1, col2): return pair_columns(df1, col1, col2).isin(pair_columns(df2, col1, col2)) identical = df1.loc[paired_mask(df1, df2, "Campaign", "Merchant")]

tuomastik · Accepted Answer · 2019-03-19 07:21:23Z

import pandas as pd df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6), "Campaign": ["A"] * 5 + ["B"], "Merchant": [1, 1, 1, 2, 3, 1]}) df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6), "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2, "Merchant": [1, 2, 1, 2, 1, 2]}) columns_consider = ["Campaign", "Merchant"] combined = pd.concat((df1[columns_consider].drop_duplicates(), df2[columns_consider].drop_duplicates()), ignore_index=True) identical = combined[combined.duplicated()] print(identical)

Output:

 Campaign Merchant 4 A 1 5 A 2 6 B 1

Jason · Accepted Answer · 2020-11-03 06:49:58Z

The way I always go about it is by creating a lookup column:

df1['lookup'] = df1['Campaign'] + "_" + df1['Merchant'].astype(str) df2['lookup'] = df2['Campaign'] + "_" + df2['Merchant'].astype(str)

Then use loc to filter and drop the lookup columns:

df1.loc[df1['lookup'].isin(df2['lookup'])] df1.drop(columns='lookup', inplace=True)

I'm still looking for a better solution.

Stack Exchange Network

Multiple filtering pandas columns based on values in another column

3 Answers 3

Hot Network Questions

Multiple filtering pandas columns based on values in another column

3 Answers 3

Related

Hot Network Questions