Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
401 views
in Technique[技术] by (71.8m points)

python - Why is pandas '==' different than '.eq()'

Consider the series s

s = pd.Series([(1, 2), (3, 4), (5, 6)])

This is as expected

s == (3, 4)

0    False
1     True
2    False
dtype: bool

This is not

s.eq((3, 4))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

ValueError: Lengths must be equal

I was under the assumption they were the same. What is the difference between them?


What does the documentation say?

Equivalent to series == other, but with support to substitute a fill_value for missing data in one of the inputs.

This seems to imply that they should work the same, hence the confusion.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you encounter is actually a special case that makes it easier to compare pandas.Series or numpy.ndarray with normal python constructs. The source code reads:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # validate axis
    if axis is not None:
        self._get_axis_number(axis)
    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)
    elif isinstance(other, (np.ndarray, list, tuple)):
        if len(other) != len(self):
            # ---------------------------------------
            # you never reach the `==` path because you get into this.
            # ---------------------------------------
            raise ValueError('Lengths must be equal')  
        return self._binop(self._constructor(other, self.index), op,
                           level=level, fill_value=fill_value)
    else:
        if fill_value is not None:
            self = self.fillna(fill_value)

        return self._constructor(op(self, other),
                                 self.index).__finalize__(self)

You're hitting the ValueError because pandas assumes for .eq that you wanted the value converted to a numpy.ndarray or pandas.Series (if you give it an array, list or tuple) instead of actually comparing it to the tuple. For example if you have:

s = pd.Series([1,2,3])
s.eq([1,2,3])

you wouldn't want it to compare each element to [1,2,3].

The problem is that object arrays (as with dtype=uint) often slip through the cracks or are neglected on purpose. A simple if self.dtype != 'object' branch inside that method could resolve this issue. But maybe the developers had strong reasons to actually make this case different. I would advise to ask for clarification by posting on their bug tracker.


You haven't asked how you can make it work correctly but for completness I'll include one possibility (according to the source code it seems likely you need to wrap it as pandas.Series yourself):

>>> s.eq(pd.Series([(1, 2)]))
0     True
1    False
2    False
dtype: bool

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...