Code covered by the BSD License  

Highlights from
Calculate number of bins for histogram

4.0

4.0 | 1 rating Rate this file 38 Downloads (last 30 days) File Size: 5.02 KB File ID: #21033
image thumbnail

Calculate number of bins for histogram

by

 

11 Aug 2008 (Updated )

Automatically calculates the 'best' number of bins for a histogram.

| Watch this File

File Information
Description

Two files are included:
CALCNBINS, which calculates the "ideal" number of bins to use in a histogram, using three possible methods. (Freedman-Diaconis', Scott's and Sturges' methods.)

HISTX is a wrapper for Matlab's own histogram function HIST, that uses CALCNBINS to choose the number of bins if none is provided.

Examples:
y = randn(10000,1);
nb = calcnbins(y, 'all')
% nb =
% fd: 57
% scott: 44
% sturges: 15
calcnbins(y) %Uses the middle value from the above
% ans =
% 44
calcnbins(y, 'fd') % Choose your method
% ans =
% 57
histx(y) %Plots a histogram using middle method
histx(y, 'all') %Plots 3 histograms, using each method

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (4)
18 Aug 2008 Richie Cotton

John,

Thanks for the feedback, it is much appreciated.

The Sturges method returns the value 18 in each of your examples because it is based solely on the length of the input data. I've updated the documentation to explain a little about the methods, and included some references for those wishing to find out more.

Also newly included is the histx function which acts as a wrapper for hist, calling calcnbins when no breaks are specified.

13 Aug 2008 John D'Errico

Getting better. But still apparently an interesting feature.

n = calcnbins(randn(100000,1),'all')
n =
fd: 152
scott: 117
sturges: 18

n = calcnbins(randn(100000,1),'all')
n =
fd: 149
scott: 115
sturges: 18

n = calcnbins(rand(100000,1),'all')
n =
fd: 47
scott: 46
sturges: 18

n = calcnbins(rand(100000,1).^8,'all')
n =
fd: 233
scott: 62
sturges: 18

n = calcnbins(rand(100000,1).^.5,'all')
n =
fd: 64
scott: 57
sturges: 18

n = calcnbins(rand(100000,1),'all')
n =
fd: 47
scott: 46
sturges: 18

Note that all cases seem to generate exactly 18 bins for the last method, although for smaller samples this is not true. Is 18 the largest number of bins that the Sturges method will return? (No.)

n = calcnbins(randn(1000,1),'all')
n =
fd: 26
scott: 21
sturges: 11

n = calcnbins(rand(1000000,1),'all')
n =
fd: 100
scott: 99
sturges: 21

It might be useful to provide either additional information about the methods, i.e., why use one over another and when is one better than the others? For example, if a set of data tends to have outliers, is one method more intelligent in its choice? At the very least, provide a reference that would help a user to understand the methods.

Overall, this is a nice little tool. Were I the author, I might even be tempted to add a new version of hist that would call this code first when the number of bins was not specified, Otherwise, it would just call the default hist.

12 Aug 2008 Richie Cotton

The help is now fixed, as is the bug with the Scott method (simply a case of missing brackets), and the dependency on the stats toolbox has been removed.

11 Aug 2008 John D'Errico

Pretty good, with good help in general. Error checks, with a default for the method. The method is tolerant of lower case, etc. All well done.

I'd suggest a couple of minor changes. There is no complete H1 line, as the first comment line. The first two lines of the help were:

% NBINS = CALCNBINS(X, METHOD) calculates the "ideal" number of bins to use
% in a histogram, using a choice of methods.

Since the utility of this code is to work with a histogram, you should have that in the H1 line. So a simple re-wording of those first two lines might be:

% Calculate the "ideal" number of bins to use in a histogram, using a choice of methods.
% NBINS = CALCNBINS(X, METHOD)

I did find one interesting result, that seemed less than sterling.

x = rand(1,100000);
n = calcnbins(x,'sc')
n =
1

n = calcnbins(sqrt(x),'sc')
n =
1

n = calcnbins(x.^10,'sc')
n =
1

Surely all of the above examples would not be best served by only one bin?

Updates
12 Aug 2008

Bug fix; dependency on stats toolbox removed.

18 Aug 2008

New function; documentation updates.

24 Oct 2008

NaN values are now consistently ignored in calcnbins.

Contact us