An introduction to Descriptive Statistics

Sriram Sureshkumar
4 min readJul 5, 2021

In this blog, let’s talk about the essentials needed for you to get started on learning descriptive statistics.

As the name suggests, descriptive statistics describe the data quantitatively and gives us a summary of its features. Let’s break it down further, by quantitative data I mean the values of their occurrences, i.e., frequency.

Now not all data have similar characteristics. Apart from what they signify, they’re also very different in how they are distributed. Hence, they cannot be analyzed or described in the same way. They should be treated specifically based on the way they’re distributed in order to efficiently analyze them. Before we go exploring the various types of distributions, let’s first see how we can define a distribution.

Random Variables

The most basic concept we need to know for defining distributions are Random Variables. It’s a variable whose possible values are numerical outcomes of a random event. An example would be, Random variable X is defined as the percentage of marks scored by a student in his exams. Here the Random variable X can take on any value between 0 and 100.

Broadly, Random Variables are divided into 2 types:

  1. Discrete Random Variables
  2. Continuous Random Variables

The best way to describe the above 2 types are,

Discrete Random Variables: They’re countable and have absolute values.

Continuous Random Variables: They’re NOT countable and are defined between intervals.

The best way to understand these would be through examples.

An example of a Discrete Random Variable (X) would be the number of students in a classroom. X would always be a countable value which is absolute. For instance, we cannot have 30.5 students in a classroom. It must be either 30 students or 31 students.

An example for a Continuous Random Variable (Y) would be the height of a student. Here, the height will not be an absolute value. If the exact height in decimals is measured, it can take on any value. It can be 170 cm, 170.1 cm, 170.11 cm, etc. Since there are infinite values between 170 cm and 171 cm, the probability of one value is zero. So, P(170) = P(170.1) = P(170.11) = 0, but P(170<X<171) is a non-zero value. That’s precisely the reason Continuous Random Variables are defined between intervals.

Now that we’ve got a general idea about Random Variables, let’s dive deeper into the different types of discrete and continuous distributions.

Although all the distributions are important, the normal distribution is well known and is used extensively. Let’s dwell on it further.

This is how a normal distribution curve looks like,

Important properties of a normal distribution:

  • It has a symmetrical ‘bell-shaped’ density curve.
  • Mean = Median = Mode
  • If the Mean = 0 and Standard deviation = 1, then the distribution becomes a standard normal distribution.

Stressing on the last point, we can convert any normal distribution to a standard normal distribution using this formula,

Reasons to standardize a normal distribution,

  • Comparison of 2 distributions with different means and standard deviations.
  • Normalize scores for better calculations.

The other types of distributions are also very much important and the graphs for these distributions aren’t always symmetrical like the normal distribution curve. Getting the data in a normal distribution can be beneficial but that’s not always the case, hence knowing further about the other distributions can help in understanding and interpreting the data.

Let’s have a look at the graphs for other types of distributions.

Graphs of different distributions

Since this blog is an introduction, let’s not discuss other distributions in detail, but knowing the intuition behind the above distributions will definitely be helpful for data science, ML, and statistics aspirants.

--

--