Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
499 views
in Technique[技术] by (71.8m points)

python - Matplotlib cumulative histogram - vertical line placement bug or misinterpretation?

I am not sure if this is a bug or if I am simply misinterpreting the output of matplotlib's cumulative histogram. E.g., what I expect is "at a certain x value, the corresponding y-value tells me how many samples are <= x."

import matplotlib.pyplot as plt

X = [1.1, 3.1, 2.1, 3.9]
n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True)
plt.ylim([0, 5])
plt.grid()
plt.show()

enter image description here

See the 2nd vertical line at x=1.9? Shouldn't it be at 2.1 given the data in X? E.g., at x=3 I would read "3 samples have a value x <= 3.1" ...

So, basically what I would expect is something similar to this step plot.

plt.step(sorted(X), range(1, len(X)+1), where='post')
plt.ylim([0, 5])
plt.grid()

enter image description here

Edit:

I am using python 3.4.3 & matplotlib 1.4.3

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you do not set the bins parameter yourself, plt.hist will choose (by default, 10) bins for you:

In [58]: n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True)

In [59]: bins
Out[59]: 
array([ 1.1 ,  1.38,  1.66,  1.94,  2.22,  2.5 ,  2.78,  3.06,  3.34,
        3.62,  3.9 ])

The return value bins shows the edges of the bins that matplotlib chose.

It sounds like you want the values in X to serve as bin edges. Using bins=sorted(X)+[np.inf]:

import numpy as np
import matplotlib.pyplot as plt

X = [1.1, 3.1, 2.1, 3.9]
bins = sorted(X) + [np.inf]
n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True, 
                            bins=bins)
plt.ylim([0, 5])
plt.grid()
plt.show()

yields

image

The [np.inf] makes the right edge of the final bin extend to infinity. Matplotlib is smart enough to not try to draw non-finite values, so all you see is the left-edge of the last bin.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...