Covariance vs. Variance: Understanding the Differences

In the world of statistics and probability, the concepts of covariance and variance are fundamental. Despite their similar-sounding names, they serve different purposes and convey distinct information about the data they represent. To understand the difference between covariance and variance, let's break down each concept, explore their applications, and illustrate with examples. By the end of this article, you'll have a clear understanding of both terms and how they relate to each other.

Variance: A Measure of Spread

Variance is a statistical measure that tells us how much the values in a dataset deviate from the mean (average) of the dataset. In simpler terms, it measures the spread or dispersion of the data points. A high variance indicates that the data points are spread out widely around the mean, whereas a low variance suggests that the data points are clustered closely around the mean.

Formula

The formula for variance is given as:

Variance = (1/N) * sum of (each data point - mean)^2

Where:

"N" is the number of observations,
"each data point" represents individual values in the dataset, and
"mean" is the average value of all the data points.

Example

Consider a small dataset of exam scores: 70, 75, 80, 85, and 90. The mean of these scores is 80. To find the variance:

Calculate the differences from the mean: -10, -5, 0, 5, 10.
Square each difference: 100, 25, 0, 25, 100.
Find the average of these squared differences: Variance = (100 + 25 + 0 + 25 + 100) / 5 = 50.

Thus, the variance of the exam scores is 50.

Covariance: A Measure of Relationship

Covariance is a measure that indicates the extent to which two variables change together. If the variables tend to increase and decrease together, the covariance is positive. If one variable tends to increase when the other decreases, the covariance is negative. A covariance close to zero suggests a weak or no linear relationship between the variables.

Formula

The formula for covariance between two variables X and Y is:

Covariance(X, Y) = (1/N) * sum of [(X_i - mean of X) * (Y_i - mean of Y)]

Where:

"N" is the number of observations,
"X_i" and "Y_i" represent each pair of data points from X and Y, respectively,
"mean of X" and "mean of Y" are the average values of X and Y, respectively.

Example

Let's consider two datasets: X representing the hours studied and Y representing the scores obtained, with X = [2, 4, 6, 8] and Y = [50, 60, 70, 80]. The means of X and Y are 5 and 65, respectively. To calculate the covariance:

Calculate the product of the differences from the mean for each pair: (2-5)(50-65) = 45, (4-5)(60-65) = 5, (6-5)(70-65) = 5, (8-5)(80-65) = 45.
Find the average of these products: Covariance(X, Y) = (45 + 5 + 5 + 45) / 4 = 25.

The positive covariance of 25 suggests that, on average, higher hours studied are associated with higher scores.

Key Differences

The primary difference between variance and covariance lies in what they measure:

Variance measures how much individual data points in a single variable deviate from the mean of that variable. It provides a measure of the spread or dispersion within a single dataset.
Covariance measures how two variables move together. It tells us if increasing one variable corresponds with an increase or decrease in another variable.

Another key difference is in their units:

The unit of variance is the square of the unit of the original data (e.g., if the data is in meters, the variance is in square meters).
Covariance, however, has the units derived from the product of the units of the two variables it compares (e.g., if one variable is in meters and the other is in kilograms, the covariance would be in meter-kilograms).

Read on how to find variance on ti-84.