Leo's Technical Blog

Python Comparison Weirdness

Introduction

user

Leo Soto


python

Python Comparison Weirdness

Posted by Leo Soto on .
Featured

python

Python Comparison Weirdness

Posted by Leo Soto on .

While tracking a Jython bug related with some __cmp__ methods (on dict and unicode, ant least) I had to check how __cmp__ behaves on CPython. And got a few surprises:

  
>>> {} == ''
False  

It sounds right, but...

  
>>> {}.__eq__('')
NotImplemented  

Oh. So == isn't using __eq__ to check for equality. It's using the old three way comparison function:

  
>>> cmp({}, '')
-1

So, as '' is greater than {}, then they are not equal. But...

  
>>> {}.__cmp__('1')
Traceback (most recent call last):  
  File "", line 1, in 
TypeError: dict.__cmp__(x,y) requires y to be a 'dict', not a 'str'  

Oops. Isn't cmp(foo, bar) the same that foo.__cmp__(bar), at least when hasattr(foo, '__cmp__')? Well, obviously, not always.

For some reason, CPython does a bit of "type checking" when you indirectly use dict.__cmp__. If you compare a dict with an instance of a incompatible type, it does a "default comparison" by class name, instead of raising TypeError. By looking at CPython sources it seems that this is the case for every type where tp_compare is implemented in C.

So, we get a -1 from cmp({}, '') because 'dict' < 'string'. Weird. But that isn't all. If it were, probably I wouldn't bothered to write this.

Let's derive dict and check what happens:

  
>>> class dict_derived(dict): pass
...
>>> cmp(dict_derived(), '')
-1
>>> dict_derived().__cmp__('')
Traceback (most recent call last):  
  File "", line 1, in 
TypeError: dict_derived.__cmp__(x,y) requires y to be a 'dict_derived', not a 'str'  

No surprises: It inherits the behavior from dict. So, remembering what I said above:

If you compare a dict with an instance of an incompatible type, Python does a "default comparison" by class name, instead of raising TypeError.
Now we can extend it to:

If you compare a dict or an dict-derived instance with an instance of an incompatible type, Python does a "default comparison" by class name, instead of raising TypeError.

But, why am I saying that it applies only to dicts? [Or, AFAICS, special types where the comparision function is written in C] Why not to every type? Aswer:

  
>>> class Foo(object):
...   def __cmp__(self, other):
...     raise TypeError("Foos are not comparable")
... 
>>> Foo() == ''
Traceback (most recent call last):  
  File "", line 1, in 
  File "", line 3, in __cmp__
TypeError: Foos are not comparable  
>>> cmp(Foo(), '')
Traceback (most recent call last):  
  File "", line 1, in 
  File "", line 3, in __cmp__
TypeError: Foos are not comparable  

So, on one hand we have dict (and maybe other builtin types) where cmp() and comparison operators doesn't raise TypeError even if __cmp__ does. And on another, user-defined classes where the raised TypeError does "leak". In the middle, our dict_derived class inherited the behavior from dict. But look at this:

  
>>> class dict_derived2(dict):
...   def __cmp__(self, other):
...     super(dict_derived2, self).__cmp__(other)
... 
>>> cmp(dict_derived2(), '')
Traceback (most recent call last):  
  File "", line 1, in 
  File "", line 3, in __cmp__
TypeError: dict_derived2.__cmp__(x,y) requires y to be a 'dict_derived2', not a 'str'  

Dict-derived types inherit the behaviour of dict, unless they override __cmp__. CPython doesn't care that the new __cmp__ just call the original dict.__cmp__. The only important thing is that there is a __cmp__ implemented on python code. Once you write a "custom" __cmp__, cmp(), == and all the other comparison operators will raise the exception.

To summarize, here is final rule for dict.__cmp__:

If you compare a dict or an dict-derived instance with an instance of an incompatible type, and __cmp__ is not overriden, Python does a "default comparison" by class name, instead of raising TypeError

Note that this rule is not directly applicable to other builtin types that implement cmp:

  
>>> set().__cmp__('')
Traceback (most recent call last):  
  File "", line 1, in 
TypeError: set.__cmp__(x,y) requires y to be a 'set', not a 'str'  
>>> cmp(set(), '')
Traceback (most recent call last):  
  File "", line 1, in 
TypeError: can only compare to a set  
>>> set() == ''
False  
>>> set().__eq__('')
False  

With set, TypeError is raised on __cmp__ and on cmp(), but not on ==. That's because set.__eq__ takes care of returning False if the argument type is not compatible. The end result sounds quite reasonable, because you can still do check for equality against instances of other types (like set() != ''), but can't compare for ordering against them (set() > 1 raises an error instead of doing a weird class name comparison).

I suppose that the roots of this inconsistency are historical accidents. I'm curious to see if all this changed on Python 3.0.