5

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

e.g. I get different values from hash() on different platforms

32-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]); >>> hash(a) 1555175235 

64-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]); >>> hash(a) -6076998737938213053 
1

2 Answers 2

8

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

AFAIK hashes are not guaranteed to be unique. In fact where they are used for lookup tables (like in dictionaries) hashes conflicts are quite common.

That said. If you want a consistent, non-unique "hash" across platform, I would try to use the standard library hashlib. I don't have the possibility to try it on different platform, but I believe that most of the algorithms implemented there (as for example the MD5 one) are platform-independent.

I would feed the hashing algorithms with the pickled version of the sorted set, so as to make sure the string used for hashing is always the same.


EDIT: Thought to add a basic example:

>>> import cPickle as pkl >>> import hashlib as hl >>> s = frozenset([1,2,3]) >>> p = pkl.dumps(sorted(s)) #make sure you use the same pickle protocol on all platform! '(lp1\nI1\naI2\naI3\na.' >>> h = hl.md5(p) <md5 HASH object @ 0xb76fb110> >>> h.digest() "\x89\xaeG\x1d'\x83\xa5\xbd\xac\xa7\x1c\xd9\x1d/2t" #this should be consistent 
Sign up to request clarification or add additional context in comments.

5 Comments

For what is worth, I've checked that the md5 sum of the pickled frozenset is the same in both 32 and 64-bit platforms.
Is a pickling of an unordered type guaranteed to pickle to the same thing on all platforms? I'm reluctant to trust that it will be. I'd probably pull the elements out of the sets into lists, and sort them prior to pickling.
Thanks. I'd have to recurse through my nested frozensets, and convert them to sorted lists, but otherwise this should work.
@dstromberg, I don't believe it is. I think that's why he sorts the frozenset first.
There is no guarantee that the same object pickled in two different sessions / platforms will result in identical files - even if the objects are ordered. Therefore, you shouldn't rely on this unless you enjoy solving very hard debugging puzzles in the future. Perhaps recursive sorted and repr might work, since repr on sorted objects is probably uniquely defined, but I have neither tried it, nor thought about it enough to have any confidence it would work.
0

You can also create your own hash function:

def hash(fs): res = 1 for v in fs: res = (res*31 + v) % 2**30 return res 

This won't necessarily be unique, but it is as good as the build in set hash, and you have full control over its result across platforms.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.