Skew and How Skewness Is Calculated In Statistical Software

0 Comments
Join the Conversation
Skewness In Graph Can Be Calculated - M Bell
Skewness In Graph Can Be Calculated - M Bell
The skew of distributions is often calculated in statistics software like JMP or Excel. This article shows how skew is calculated and when it is significant

In statistics, skew is a term used to describe the degree to which a curve is asymmetrical, or how much each side (from the mean) of the curve is different. It may be described mathematically as the "third moment" of the distribution. Statistical software packages such as SPSS, Minitab, and JMP all provide functions with which to calculate skew, and the SKEW function is also provided in Microsoft Excel. Almost every data set will provide a number for skew, and it is useful to know when the degree of skewness (or skewedness) is significant.

Why Skewness Calculations Are Useful

Most distributions show some level of skew, or are slightly inclined to one side. This is due to the physical limitations of the variable being measured. For example, a plot of days illness against frequency will be skewed to the higher side, because the lower limit must be zero. In inferential statistics, the choice of test is often conditional upon the data distribution type to be tested. Gaussian distributions allow parametric (t-test etc) to be used, whereas non-Gaussian distributions require non-parametric tests (median test etc). It is therefore important to know when a skew value is significant.

How Statistics Software Packages Calculate Skewness: Skewness Equations

As with many statistics, there is a different formula used to calculate a value for a population than to calculate the value from a sample:

For the population,

skew = Σ ( (x - µ ) / σ )³ / N

Where

  • x = Each individual number in the population
  • µ = Average of the numbers in the population
  • N = Number in the population
  • σ = Standard deviation of the population

For a sample,

skew = Σ ( (x - xbar ) / s )^3 × n / ( (n - 1) × (n - 2) )

Where

  • x = Each individual number in the sample
  • xbar = Average of the numbers in the sample
  • n = Number in the sample
  • s = Standard deviation of the sample

(See Figure 1 also).

JMP and Excel both use the sample formula. There needs to be at least three numbers in the sample, or an error will be returned, since the denominator contains (n - 1) × (n - 2)

When Is Skew Significant?

The skew value calculated will rarely be equal to zero, and for real data sets can be a positive or negative number. Even when a population distribution has zero skew, the samples taken from it will produce a skewness value that is not zero. The question then arises of when to consider the skew value to be significant. One commonly used test is to check if the skewness value calculated (or produced by statistics software) is more than two standard errors of skew. Tabachnick & Fidell state that the standard error of skew can be estimated as:

√(6 / N)

where N is the sample size.

So if the skew value produced by statistics software is more than 2 × √(6 / N) then there is significant skew in the distribution.

Example: Ten students are measured, and their heights in centimeters are:

180, 182, 169, 175, 178, 189, 174, 174, 171, 168

The skewness (as calculated by Excel, JMP and Minitab) is 0.778

The standard error of skew is √(6/10) = 0.774

The value at which skewness would be considered significant is 2 × 0.774 = 1.548, but the skewness is only 0.778 for the students measured, so the skewness is not significant.

Skewness Calculation and Evaluation Summary

Most statistical distributions are skewed to the right (positive skew) or left (negative skew). The degree of skewedness can be quantified by the formula shown here, and that degree of skewedness may be compared to a "standard error of skew" that may be used to decide whether the skew is significant or not. Low skewness is important, since it is a condition on which many parametric statistical tests are based. Non-Gaussian distributions may become normalized using the Central Limit Theorem, and a check for normality is usually needed before proceeding with a parametric test. Skewness is often quoted along with kurtosis when summarizing data.

Skewness Calculation References

Tabachnick, B.G., & Fidell, L.S. (1996). Using multivariate statistics (3rd Ed). New York: Harper Collins.

Microsoft [Computer Software]. (1996). Excel. Redmond, WA: Microsoft Corporation.

Me at Lake Garda, Summer 2008, Photograph taken by Alison Bell

Martin Bell - Martin holds a B.Sc. degree in chemical engineering, and an M.Sc. degree in electronics and computing. He has spent more than 25 years ...

rss
Advertisement
Leave a comment

NOTE: Because you are not a Suite101 member, your comment will be moderated before it is viewable.
Submit
What is 7+2?
Advertisement
Advertisement