Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
848 views
in Technique[技术] by (71.8m points)

list - Find empty or NaN entry in Pandas Dataframe

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id       a           c         d         e        A1              A2             A3
    0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2       3   -1.259333  1.074986  1.834653    system                       1.100353
    3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5       6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...