9

I come from Java where even mutable objects can be "hashable".
And I am playing with Python 3.x these days just for fun.

Here is the definition of hashable in Python (from the Python glossary).

hashable

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

I read it and I am thinking... Still... Why didn't they make in Python even mutable objects hashable? E.g. using the same default hashing mechanism as for user-defined objects i.e. as described by the last 2 sentences above.

Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

This feels somewhat weird... so user-defined mutable objects are hashable (via this default hashing mechanism) but built-in mutable objects are not hashable. Doesn't this just complicate things? I don't see what benefits it brings, could someone explain?

7
  • 1
    To hash a list, you would have to first hash everything in it; because it is mutable, any change to it should change the hash. To hash a tuple, it is sufficient to hash its id() because it can't be modified. Users would generally expect hashing a user-defined class to behave more like an immutable object (even though it is mutable) so that's what it does, although of course you can make it behave any way you like. Commented May 6, 2019 at 22:00
  • 2
    @kindall: Tuples are not hashed by id. Commented May 6, 2019 at 22:09
  • 3
    @peter.petrov: In Java, the language you're coming from, collection hashCode values are based on container contents. For example, here are the List docs. Anything else would be broken and/or useless, since equals is based on collection contents. Commented May 6, 2019 at 22:30
  • 1
    @peter.petrov Doesn't that mean you could have two different lists with the exact same contents and they hash to two different things? This makes semantics very confusing if you want to check if something like a set has a list in it. E.g. if you did x = {[1]}, then [1] in x would return False. It might be hard to say what is or is not confusing to a beginner, but broken semantics can't be good... Commented May 6, 2019 at 22:31
  • 2
    "by letting you put mutable objects that hash based on mutable state into HashMaps" Wow... I was assuming in Java they don't hash Lists based on mutable state but just based on id or something (as Python does for its mutable user-defined objects). This assumption was wrong... That's really weird (the Java way, I mean)... So I asked a question about Python here... but learnt (or refreshed) something about Java too. That is strange behavior in Java indeed. I don't like it either. Now the Python way indeed makes some more sense to me. Commented May 6, 2019 at 22:50

3 Answers 3

3

In Python, mutable objects can be hashable, but it is generally not a good idea, because generally speaking, the equality is defined in terms of these mutable attributes, and this can lead to all sorts of crazy behavhior.

If built-in mutable objects are hashed based on identity, like the default hashing mechanism for user-defined objects, then their hash would be inconsistent with their equality. And that is absolutely a problem. However, user-defined objects by default compare and hash based on identity, so it isn't as bad of a situation, although, this set of affairs isn't very useful.

Note, if you implement __eq__ in a user-defined class, the __hash__ is set to None, making the class unhashable.

So, from the Python 3 data model documentation:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the __hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.abc.Hashable).

Sign up to request clarification or add additional context in comments.

4 Comments

"If built-in mutable objects are hashed based on identity, like the default hashing mechanism for user-defined objects, then their hash would be inconsistent with their equality. And that is absolutely a problem." Hm... That is a good point... that is a problem you say... but when viewed purely from computer science standpoint... that's a problem because of how equality is implemented.
E.g. to me (since I come from Java) lst1 == lst2 returning true (for 2 different lists with same values) seems strange too. So... OK... I guess it is just a very basic language design decision to question it (as I do here) but still... seems like it makes things a bit too complicated.
@peter.petrov in Python, == is equivalent to Java's .equals, whereas == in Java is equivalent to is in Python.
Yeah... that I know already... OK, well, thanks, I will think some more on all this.
3

From reading other comments/answers, it seems like what you're not buying is that you have to change a hash of a mutable entity when it mutates, and that you can just hash by id, so I'll try to elaborate on this point.

To quote you:

@kindall Hm... Who says that the hash value has to come from the values in the list? And that if you e.g. add a new value you have to rehash the list, get a new hash value, etc.. In other languages that's not how it is... this is my point. In other languages the hash value just comes from the id (or is the id itself, just like for user-defined mutable Python objects)... And OK... I just feel it makes things a bit too complicated in Python (especially for beginners... not for me).

This isn't exactly false (although I do not know what "other" languages you are referencing), you could do that, but there are some pretty dire consequences:

class HashableList(list): def __hash__(self): return id(self) x = HashableList([1,2,3]) y = HashableList([1,2,3]) our_set = {x} print("Is x in our_set? ", x in our_set) print("Is y in our_set? ", y in our_set) print("Are x and y equal? ", x == y) 

This (unexpectedly) outputs:

Is x in our_set? True Is y in our_set? False <-- potentially confusing Are x and y equal? True 

This means that the hash is not consistent with equality, which is just downright confusing.

You might counter with "well, just hash by the contents then", but I think you already understand that if the contents change then you get other undesirable behavior (for example):

class HashableListByContents(list): def __hash__(self): return sum(hash(x) for x in self) a = HashableListByContents([1,2,3]) b = HashableListByContents([1,2,3]) our_set = {a} print('Is a in our_set? ', a in our_set) print('Is b in our_set? ', b in our_set) print('Are a and b equal? ', a == b) 

This outputs:

Is a in our_set? True Is b in our_set? True Are a and b equal? True 

So far so good! But...

a.append(2) print('Is a still in our set? ', a in our_set) 

this outputs:

Is a still in our set? False <-- potentially confusing 

I am not a Python beginner, so I would not presume to know what would or would not confuse a Python beginner, but either way this seems confusing to me (at best). My two cents is that it's simply incorrect to hash mutable objects. I mean we have functional purists that claim mutable objects are just incorrect, period! Python won't stop you from doing any of what you described, because it would never force a paradigm like that, but it's really asking for trouble no matter what route you go down.

HTH!

2 Comments

> Is a still in our set? False <-- potentially confusing Hmm, I don't see how this is confusing. You've mutated the list to add a new value, which changes the hash (which makes perfect sense) and so it is reported as no longer in the set. What am I missing?
@BenjaminCrawfordCtrl-Alt-Tut It violates referential transparency semantics. This is actually a common violation in non-functional languages, but it's somewhat "unintuitive" to folks who have a math background (like myself). I'll caveat this by claiming "unintuitive" is a subjective term. I agree with you in a sense. It's not that confusing because there is an explanation for why it happens (hence the word potentially). But the explanation seems to defy the basic semantics of symbols...
1

Calculating a hash value is like giving an identity to an object which simplify the comparison of objects. The comparison by hash value is generally faster than the comparison by value: for an object, you compare its attributes, for a collection, you compare its items, recursively…

If an object is mutable you need to calculate its hash value again after each changes. If this object was compared equal with another one, after a change it becomes unequal. So, mutable objects must be compared by value, not by hash. It’s a non-send to compare by hash values for mutable objects.

Edit: Java HashCode

Typically, hashCode() just returns the object's address in memory if you don't override it.

See the reference about the hashCode function.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

So, the Java hashCode function works the same as the default Python __hash__ function.

In Java, if you use a mutable object in a HashSet, for instance, the HashSet isn’t working properly. Because the hashCode depends of the state of the object it can no longer be retrieved properly, so the check for containment fails.

6 Comments

Well... this is again taking a one-sided (Pythonic) view on the question. Quote: "If an object is mutable you need to calculate its hash value again after each changes" This statement is not true in general, it is true in Python.
@peter.petrov well, yes, but you are asking about Python.
@juanpa.arrivillaga Well... OK... yes and no... Think a bit more deeply, I think you see what I mean.
"So, the Java hashCode function works the same as the default Python hash function." Yes, for user-defined objects - yes. That's the whole point of my question. Why didn't they make built-in immutable and user-defined immutable objects behave uniformly as far as hashing goes? Sounds somewhat unbalanced, asymmetrical this behavior. Never mind... I know... I have to ask whoever made the decision :) It's a deep/basic language design thing. I was just curious what the community will say. Thanks for your perspective anyway.
@peter.petrov: Built-in classes and properly-implemented user-defined classes behave identically in this regard. There is no asymmetry.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.