4
$\begingroup$

I'm searching for good hash-function for N-dimensional vector of M-bit integer numbers with a property that any permutation of the coordinates gives the same result.

e.g. $ h(x,y,z) = h(x,z,y) = h(y,x,z) = h(y,z,x) = h(z,x,y) = h(z,y,x) $

by good hash-function I mean following:

  1. good avalanche properties - There is minimum correlation between values of arguments and values of resulting hash. To avoid mapping of similar arguments to the same hash.
  2. good statistical properties - results are uniformly distributed. ( to minimize statistical probability of collision = mapping different arguments to the same hash )
  3. fast to compute on computer - i.e. use just operation like bitwise and, or, xor, not, bit-shift, aritmetical addition, subtraction, multiplication, maybe modulo

NOTE 1

I certainly do not require perfect hash function (which is bijective; without any collisions ) since I want to use it for dimensionality reduction. The resulting hash should has less bits than the original vector.


NOTE 2

hash function like

$h(x,y,z)= x + y + z$

$h(x,y,z)= x\ \mathrm{XOR} \ y\ \mathrm{XOR}\ z$

has permutation symmetry, but does not have very good statistical properties and avalanche properties. They have too much correlation between arguments and results, therefore for some sets of arguments gives too much collisions.

$\endgroup$

3 Answers 3

3
$\begingroup$

Just hash each element independently using the same hash function, and take the sum.

$\endgroup$
1
  • $\begingroup$ heh, on first look I would think that it would not be very good hash (with respect to 1. avalanche 2. statistical properties ), but actually why I don't see any specific problem why not. Perhaps you are right. $\endgroup$ Commented Apr 18, 2016 at 11:22
2
$\begingroup$

The obvious solution is to sort the components and then use any of-the-shelf hashing function. By sorting them you effectively map the vector space into itself in a way that is invariant under permutations of the components.

Actually applying any mapping $\phi$ that is invariant under permutation combined with any hash function $\chi$ (or any other function) would result in a (hash) function that is invariant under permutations.

$\endgroup$
1
  • 1
    $\begingroup$ That is certainly a solution. But I was thinking that by design of hash function It would be possible to avoid this, I consider sorting is slower and more complex algorithm than computation of hash. $\endgroup$ Commented Mar 11, 2016 at 10:26
0
$\begingroup$

Taking the sum, as the answer by user1502040 suggests, is a bad idea:

  1. It will mean that the hash value grows with the size of the list, but usually a fixed size hash is desirable
  2. It will lead to more collision due to the Central Limit Theorem.

Here's a discussion on how python's frozenset solves the problem: https://stackoverflow.com/questions/20832279/python-frozenset-hashing-algorithm-implementation

Essentially, keep xoring with the hashes of the items with some multiply-shifts thrown in.

Other discussions:

$\endgroup$
3
  • $\begingroup$ I meant take the sum mod 2^n. Xor has the problem that x ^ x = 0. I'm assuming the base hash acts like a random function. If not, results may of course vary. If you care about security you should re-hash the result. $\endgroup$ Commented Feb 24, 2023 at 14:01
  • $\begingroup$ @user1502040 I don't see how modular arithmetic solves the concentration problem posed by the CLT. As with XOR, you don't actually do x ^ x, if you bothered to check the link you could see that for example python does total ^= (x ^ (x << 16) ^ 89869747) * 3644798167 $\endgroup$ Commented Feb 25, 2023 at 21:34
  • $\begingroup$ I meant in the case that you allow repeating elements and are e.g. hashing the sequence [x, x]. And the CLT is not a problem, because if x ~ U(0, 2^n - 1), then x + y ~ U(0, 2^n - 1) regardless of the distribution of y. Also, the CLT says that the variance of the mean decreases with the sample count, the variance of the sum goes up. $\endgroup$ Commented Feb 25, 2023 at 21:40

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.