Quantcast

Documentation Center

  • Trial Software
  • Product Updates

Exploratory Analysis of Data

This example shows how to explore the distribution of data using descriptive statistics.

Generate sample data.

rng('default') % for reproducibility
x = [normrnd(4,1,1,100) normrnd(6,0.5,1,200)];

Create a histogram of data with normal density fit.

histfit(x)

The distribution of the data seems left skewed and normal distribution does not look like a good fit to this distribution.

Obtain a normal probability plot.

probplot('normal',x)

This probability plot also clearly shows the deviation of data from normality.

Compute quantiles of data.

p = 0:0.25:1;
y = quantile(x,p);
z = [p;y]
z =

         0    0.2500    0.5000    0.7500    1.0000
    1.0557    4.7375    5.6872    6.1526    7.5784

Plot a box plot.

A box plot helps to visualize the statistics.

boxplot(x)

You can also see the 0.25, 0.5, and 0.75 quantiles in the box plot. The long lower tail and plus signs also show the lack of symmetry in the sample values.

Compute the mean and median of data.

 y = [mean(x) median(x)]
y =

    5.3438    5.6872

The mean and median values seem close to each other, but a mean smaller than the median usually flags left skewness of the data.

Compute the skewness and kurtosis of data.

y = [skewness(x) kurtosis(x)]
y =

   -1.0417    3.5895

A negative skewness value means the data is left skewed. The data has a larger peakedness than a normal distribution because the kurtosis value is greater than 3.

Identify possible outliers.

Compute z-scores. Find the z-scores that are greater than 3 or less than –3.

Z = zscore(x);
find(abs(Z)>3);
ans =

     3    35

The 3rd and 35th observations might be outliers.

See Also

| | | | | | |

More About

Was this topic helpful?