Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
405 views
in Technique[技术] by (71.8m points)

python - How to choose bins in matplotlib histogram

Can someone explain to me what "bins" in histogram are (the matplotlib hist function)? And assuming I need to plot the probability density function of some data, how do the bins I choose influence that? and how do I choose them? (I already read about them in the matplotlib.pyplot.hist and the numpy.histogram libraries but I did not get the idea)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The bins parameter tells you the number of bins that your data will be divided into. You can specify it as an integer or as a list of bin edges.

For example, here we ask for 20 bins:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(1000)
plt.hist(x, bins=20)

enter image description here

And here we ask for bin edges at the locations [-4, -3, -2... 3, 4].

plt.hist(x, bins=range(-4, 5))

enter image description here

Your question about how to choose the "best" number of bins is an interesting one, and there's actually a fairly vast literature on the subject. There are some commonly-used rules-of-thumb that have been proposed (e.g. the Freedman-Diaconis Rule, Sturges' Rule, Scott's Rule, the Square-root rule, etc.) each of which has its own strengths and weaknesses.

If you want a nice Python implementation of a variety of these auto-tuning histogram rules, you might check out the histogram functionality in the latest version of the AstroPy package, described here. This works just like plt.hist, but lets you use syntax like, e.g. hist(x, bins='freedman') for choosing bins via the Freedman-Diaconis rule mentioned above.

My personal favorite is "Bayesian Blocks" (bins="blocks"), which solves for optimal binning with unequal bin widths. You can read a bit more on that here.


Edit, April 2017: with matplotlib version 2.0 or later and numpy version 1.11 or later, you can now specify automatically-determined bins directly in matplotlib, by specifying, e.g. bins='auto'. This uses the maximum of the Sturges and Freedman-Diaconis bin choice. You can read more about the options in the numpy.histogram docs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...