Statistical Z-Score, P-Value and Gaussian Curve

Areas Under The Normal Distribution - Complementary Error Function

1 Comments
Join the Conversation
Figure 1 - Normal Distribution - M Bell
Figure 1 - Normal Distribution - M Bell
Z-score and p-value are needed to calculate statistical sample sizes. This article describes what these values are, and how to calculate them for a Gaussian distribution.

Sample size calculations may be performed easily by simply putting the appropriate values into the correct formula. The correct equation is determined by the distribution type (mean, proportion, standard deviation etc), The Z-score used in each formula, however, needs to be calculated first. The z-score calculation is described here.

What Is The Normal Distribution Curve?

The Normal Distribution Curve is a continuous probability distribution. It is a good approximation to many real-world measurements, such as the distribution of heights, weights, lengths etc. The formula for the curve is:

y = (2 x π)^(-1/2).exp(- 0.5 z²)

The curve has some very useful characteristics: no matter what the y-variable is (lengths, heights, etc), the area under the curve from one value of x to another is always the same. For example, from z = -1 to z = +1, the area is about 68% of the total curve area. From z = -2 to x = +z, the area is always about 95%

What Is Z-Value?

The z-value is the value shown on the x-axis. It represents the number of standard deviations from the mean. So, for a distribution that has a mean of 100 and a standard deviation of 10, the value 90 is represented by a z-value of:

(90 - 100) / 10

= -1

For male heights in the USA, the average is 178 cm, and the standard deviation is about 7 cm, so around 68% of males in the USA stand between 171 and 185 cm. A person with a height of 6' 4" (193 cm) has a z-value of:

(193 - 178) / 7

= 2.1

In general, z-score is calculated for each value of x, for a population with mean = μ and standard deviation = σ, as:

z = (x - μ) / σ

What Are P-Value and α?

These are explained graphically in Figure 1. The dark area under the curve to the right of z = 2 represents a proportion of the overall population. Since the Gaussian curve is symmetrical, and since 95% of the curve area is captured from z = -2 to z = +2, the remaining 5% is split between z < -2 and z > +2. Therefore about 2.5% of the population lies above z = 2.

This area represents α and the p-value. The difference between them is that α is specified, whereas the p-value is calculated. For instance, a hypothesis may require a confidence level of 95%. This means that α, which is 1 - Confidence, is set to 5%. After a statistical test has been performed, a p-value will be returned. This is the probability (i.e. the area in the dark region) that what was found happened by chance, and was not due to the null hypothesis being incorrect.

How Is Z-Score Calculated?

In the context of sample size calculations, the Zα or Zα/2 values represent a number of standard deviations. The z-score is a function of α (for a one-sided test) or α/2 (for a two-sided test). The z-score is calculated from the "Complementary Error Function", also called ERF. It is also known as the "Inverse Normal Distribution". Most look-up tables or software functions will return a value of z such that the area from minus infinity to z will cover the proportion passed to the function. For example, to use the Excel function to find what z-score is needed to represent 86% of the population, the syntax is:

=NORMSINV(0.84)

The value returned is 0.994 as expected. (The 0.84 is made up from half, or 0.5, of the population from minus infinity to 0, and 0.34 : it has already been stated that 68% of the population fall between z = -1 to z = +1, so 0.34 lies between z = -1 to z = 0, and also from z = 0 to z = +1.

For convenience, here are some other values of α and Zα:

α = 0.100, Zα = 1.28

α = 0.050, Zα = 1.64

α = 0.025, Zα = 1.96

α = 0.010, Zα = 2.33

Z-Score and P-Value Summary

Calculating sample sizes usually involves calculating a relevant z-score first. This is done quite easily with a standard software package like Microsoft Excel, using the function NORMSINV. Excel also provides a function NORMINV that can take an input of population mean and standard deviation, for the convenience of the user. Confidence and power are used in the equations too, that return the sample size needed to test the null hypothesis.

References for Z-Score and P-Value

Julious, S.A., (2009), Sample Sizes for Clinical Trials. Boca Raton: CRC Press.

Me at Lake Garda, Summer 2008, Photograph taken by Alison Bell

Martin Bell - Martin holds a B.Sc. degree in chemical engineering, and an M.Sc. degree in electronics and computing. He has spent more than 25 years ...

rss
Advertisement
Leave a comment

NOTE: Because you are not a Suite101 member, your comment will be moderated before it is viewable.
Submit
What is 0+5?

Comments

May 20, 2011 6:09 PM
Guest :
It was nice and simple and the graph certainly was a plus! Thank you!
1
Advertisement
Advertisement