Degrees of freedom (statistics)

Degrees of freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample.

Understanding Degrees of Freedom

The easiest way to understand degrees of freedom conceptually is through an example:

Consider a data sample consisting of, for the sake of simplicity, five positive integers. The values could be any number with no known relationship between them. This data sample would, theoretically, have five degrees of freedom.
Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire data sample is revealed to be 6.
This must mean that the fifth number has to be 10. It can be nothing else. It does not have the freedom to vary.
So the degrees of freedom for this data sample is 4.

The formula for degrees of freedom equals the size of the data sample minus one: Df=N−1 where:

Df=degrees of freedom

N=sample size

Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such as a chi-square. It is essential to calculate degrees of freedom when trying to understand the importance of a chi-square statistic and the validity of the null hypothesis.

Chi-Square Tests

There are two different kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?"

For these tests, degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. For example, when considering students and course choice, a sample size of 30 or 40 students is likely not large enough to generate significant data. Getting the same or similar results from a study using a sample size of 400 or 500 students is more valid.

History

The earliest and most basic concept of degrees of freedom was noted in the early 1800s, intertwined in the works of mathematician and astronomer Carl Friedrich Gauss. The modern usage and understanding of the term were expounded upon first by William Sealy Gosset, an English statistician, in his article "The Probable Error of a Mean," published in Biometrika in 1908 under a pen name to preserve his anonymity.1

In his writings, Gosset did not specifically use the term "degrees of freedom." He did, however, give an explanation for the concept throughout the course of developing what would eventually be known as Student’s T-distribution. The actual term was not made popular until 1922. English biologist and statistician Ronald Fisher began using the term "degrees of freedom" when he started publishing reports and data on his work developing chi-squares.