3
$\begingroup$

I have a pandas dataframe df1:

df1

Now, I want to filter the rows in df1 based on unique combinations of (Campaign, Merchant) from another dataframe, df2, which look like this:

enter image description here

What I tried is using .isin, with a code similar to the one below:

df1.loc[df1['Campaign'].isin(df2['Campaign']) & df1['Merchant'].isin(df2['Merchant'])] 

The problem here is that the conditions are independent eg : I want to check if (A,1) from df2 is in df1, but with the above condition, since I am checking all the list, not row by row, it would return all rows in df1 where Campaign column is A OR Merchant column is 1.

Do you have any suggestion for this multiple pandas filtering?

$\endgroup$

3 Answers 3

2
$\begingroup$

Bit late but my preferred solution to this is

# verbetim from @tuomastik import pandas as pd df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6), "Campaign": ["A"] * 5 + ["B"], "Merchant": [1, 1, 1, 2, 3, 1]}) df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6), "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2, "Merchant": [1, 2, 1, 2, 1, 2]}) # modification def pair_columns(df, col1, col2): return df[col1] + df[col2] def paired_mask(df1, df2, col1, col2): return pair_columns(df1, col1, col2).isin(pair_columns(df2, col1, col2)) identical = df1.loc[paired_mask(df1, df2, "Campaign", "Merchant")] 
$\endgroup$
1
$\begingroup$
import pandas as pd df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6), "Campaign": ["A"] * 5 + ["B"], "Merchant": [1, 1, 1, 2, 3, 1]}) df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6), "Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2, "Merchant": [1, 2, 1, 2, 1, 2]}) columns_consider = ["Campaign", "Merchant"] combined = pd.concat((df1[columns_consider].drop_duplicates(), df2[columns_consider].drop_duplicates()), ignore_index=True) identical = combined[combined.duplicated()] print(identical) 

Output:

 Campaign Merchant 4 A 1 5 A 2 6 B 1 
$\endgroup$
1
$\begingroup$

The way I always go about it is by creating a lookup column:

df1['lookup'] = df1['Campaign'] + "_" + df1['Merchant'].astype(str) df2['lookup'] = df2['Campaign'] + "_" + df2['Merchant'].astype(str) 

Then use loc to filter and drop the lookup columns:

df1.loc[df1['lookup'].isin(df2['lookup'])] df1.drop(columns='lookup', inplace=True) 

I'm still looking for a better solution.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.