Descriptive Statistics in depth for Data Science


Descriptive Statistics in depth for Data Science:
Variance is an estimation of the variability between data points in a data set. Variability can be measured by estimating the distance between data points from each other in a data set. Let us understand the concept using some examples and then we move to the mathematical part of the variance.
Suppose take two type of data set d1 and d2 i.e 8,9,10,11,12 and 25,1,2,10,12 where the average is same in both cases and the range is different in d1 is 4 and d2 is 24 but the variability in data d2 is greater than d1 (Fig 1.0).


Variance data set in data science
Fig 1.0

Calculation of Variance:-

So now you get some idea regarding variability, let us now calculate the variance mathematically.

  1. First, we get the mean of two data set.
    2. Subtract the mean with each data points.
    3.Square the difference of each data point.
  2. Sum up all the squared difference.
  3. Divide the Sum by the number of data points.
    I am showing an example for the same in below (Fig 1.1)
Variance Calculation ,standard deviation calculation in data science
Fig 1.1

Standard Deviation:-

Standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.


Leave a Reply