If values are rather small (which could be especially true if we consider multi-sets), then you could optimize it by bit-counting.
Lets define $$ f(X,Y) = [ f_0, f_1, ..., f_k ] $$ where $f_i = \# \{ x \in X : x \& 2^i = 1 \}$ (amount of numbers in $X$ that have $i'th$ bit set to $1$), and $k=\log_2(\min(\max(X),\max(Y)))$
This operation takes $O(nk)$ time, where $n=\#X$.
now it is quite easy to see, that answer to your question is given by
$$ f(A,B) \cdot f(B,A)$$
where $\cdot$ is a standard inner product.
So what we are actually doing, is counting for each bit how many times it appears in numbers in $A$ and in $B$, and then multiply them, so we get how many times $\&$ operation on the particular bit returned $1$, and then sum them up.
In your example $$1 = 1\cdot2^0, 2=0\cdot2^0+1\cdot2^1,3=1\cdot2^0+1\cdot2^1$$ $$8=0\cdot2^0+0\cdot2^1+0\cdot2^2+1\cdot2^3, 9=1\cdot2^0+0\cdot2^1+0\cdot2^2+1\cdot2^3$$ $$f(A,B) = [2,1,0,0]$$ $$f(B,A) = [1,0,0,2]$$
$$f(A,B) \cdot f(B,A) = 2 = 1 \& 9 + 3 \& 9$$
as expected
The whole process takes $O(k(n+m))$ time, which is smaller then $O(mn)$ iff $k$ is small enough (if a bit length of the longest number in the sets is smaller then the amount of set's elements).