\(~~~~~~~~~\)Probability and Statistics\(~~~~~~~~~\)

Somsak Chanaim

International College of Digital Innovation, CMU

August 17, 2025

Axiom of Probability

Axioms of Probability are three fundamental rules that define the properties of probability.

They were proposed by Andrey Kolmogorov in 1933 and form the foundation of modern probability theory.

Kolmogorov’s Axioms:


~~~~~~~Andrey N. Kolmogorov

\(~~~~~~~\)Andrey N. Kolmogorov

Let \(S\) be a sample space and let \(A\) be any event that is a subset of \(S\). The three axioms are as follows:

Example 1: Tossing a fair coin

Sample space \(S\):
All possible outcomes \[ S = \{\text{Heads}, \text{Tails}\} \] Event \(A\): Getting a Head \[ A = \{\text{Heads}\} \]

Example 2: Rolling a six-sided die

Sample space \(S\): \[ S = \{1, 2, 3, 4, 5, 6\} \] Event \(A\): Rolling an even number \[ A = \{2, 4, 6\} \]

Example 3: Drawing a card from a standard 52-card deck


Sample space \(S\): All 52 unique cards \[ S = \{\text{Ace of Hearts}, \text{2 of Hearts},\\ \ldots, \text{King of Spades}\} \] Event \(A\): Drawing a red card

\[A = \{\text{all Hearts and Diamonds}\\ \text{(26 cards)}\}\]

Axiom 1 Probability must be non-negative

\[P(A) \geq 0\]

For every event \(A\), this means the probability must always be a non-negative value—either positive or at least zero.

Axiom 2 The probability of the sample space is 1

\[P(S) = 1\]

This means the probability of the event that covers all possible outcomes must equal 1.

Axiom 3 Additivity for mutually exclusive events

If \(A\) and \(B\) are mutually exclusive events, meaning they have no outcomes in common (i.e., \(A \cap B = \emptyset\)), then
\[ P(A \cup B) = P(A) + P(B) \]
This means if two events cannot occur at the same time, the probability of either one occurring is the sum of their individual probabilities.

Results Derived from the Axioms of Probability

From the three axioms, we can infer other important properties. For example:

The probability of an impossible event is zero

\[P(\emptyset) = 0\]

An impossible event has a probability of zero because it cannot occur.

Complement Rule

\[P(A^c) = 1 - P(A)\]

This means that if the probability of event \(A\) is \(P(A)\), then the probability of the event not \(A\) is \[1−P(A)1−P(A)\]

General Addition Rule of Probability

For any events \(A\) and \(B\):

\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]

This formula applies even when the events may overlap.

Example Applying the Axioms of Probability

Rolling a single die

Let \(S = \{1, 2, 3, 4, 5, 6\}\)

  • Event \(A\): rolling an odd number → \(A = \{1, 3, 5\}\)
  • Event \(B\): rolling a number greater than 4 → \(B = \{5, 6\}\)
  • \(P(A) = \frac{3}{6} = 0.5\), \(P(B) = \frac{2}{6} = 0.333\),
    \(P(A \cap B) = \frac{1}{6} = 0.167\)

Using the addition rule: \[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]

\[= 0.5 + 0.333 - 0.167 = 0.666\]

Random Variable

A random variable is a variable that represents the outcome of a random experiment.
Its value is determined by chance or probability.

Random variables are commonly used in statistics and probability theory to describe probability distributions of data.

There are two main types of random variables:

  1. Discrete Random Variable

  2. Continuous Random Variable

1. Discrete Random Variable

  • Takes on a countable number of possible values

  • Commonly used in events where outcomes can be counted, such as the number rolled on a die or the number of correct answers on a test

Examples

  • Rolling a die: Let \(X\) be the value shown on the die → \(X = \{1, 2, 3, 4, 5, 6\}\)

  • Flipping a coin: Let \(Y\) be the number of heads when flipping a coin 3 times → \(Y = \{0, 1, 2, 3\}\)

  • Number of customers per day: Let \(X\) be the number of customers arriving each day → \(X = 0, 1, 2, 3, 4, \cdots\)

2. Continuous Random Variable

  • A variable that can take on any value within a range of real numbers

  • Used for measurable quantities such as weight, height, or time

Examples

  • Customer service time: Variable \(T\) may take values between 0 and 10 minutes

  • Temperature in a city: Variable \(Z\) may range from 25°C to 35°C

  • Investment return rate: Variable \(r \in (-100\%, \infty)\)

When we can define a specific functional form for the distribution, it is called a Probability Distribution

Probability Distribution

A probability distribution describes how often each value of a random variable is expected to occur or its likelihood.

Properties of a Discrete Probability Distribution

Let \(X\) be a random variable and \(P(X)\) be the probability of each possible value of \(X\). It must satisfy the following conditions:

  1. \(0 \leq P(X) \leq 1\) for all values of \(X\)

  2. \(\sum P(X) = 1\) (The total probability must sum to 1)

Example

Rolling a Die

Let the random variable \(X\) represent the number shown on a single six-sided die (\(X = 1, 2, 3, 4, 5, 6\))

\[P(X) = \begin{cases} \frac{1}{6}, & X = 1, 2, 3, 4, 5, 6 \\ 0, & \text{otherwise} \end{cases}\]

Question 1:

What is the probability that the number rolled is less than 4?

\[P(X < 4) = P(1) + P(2) + P(3) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = 0.5\]

Question 2:

What is the probability that the number rolled is an even number?

Even numbers on a die: 2, 4, 6

\[ P(\text{even}) = P(2) + P(4) + P(6) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = 0.5 \]

Question 3:

What is the probability that the number rolled is greater than or equal to 5?

Numbers: 5, 6

\[P(X \geq 5) = P(5) + P(6) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3} \approx 0.333\]

Important Discrete Probability Distributions

  1. Bernoulli Distribution:
    Used for events with only two possible outcomes, such as success/failure

  2. Binomial Distribution:
    Models multiple independent trials where each trial has two outcomes

  3. Poisson Distribution:
    Used to model the number of events occurring within a fixed interval of time or space

Bernoulli distribution

Definition:

The Bernoulli distribution is a discrete probability distribution for a random variable which has only two possible outcomes:

  • Success (usually coded as 1)

  • Failure (usually coded as 0)

It models the outcome of a single trial (or experiment) that can result in only one of two outcomes.

Mathematical Definition:

Let \(X \sim \text{Bernoulli}(p)\), where

  • \(X \in \{0, 1\}\)

  • \(p\) is the probability of success (i.e., \(P(X = 1) = p\))

  • \(1 - p\) is the probability of failure (i.e., \(P(X = 0) = 1 - p\))

  • \(0 \leq p \leq 1\)

Probability mass function (PMF):

\[ P(X = x) = p^x (1 - p)^{1 - x}, \quad \text{for } x \in \{0, 1\} \]

Properties:

  • Mean: \(\mathbb{E}[X] = p\)

  • Variance: \(\text{Var}(X) = p(1 - p)\)

Examples:

  • Tossing a coin (Heads = 1, Tails = 0)

  • Passing a test (Pass = 1, Fail = 0)

  • Clicking on an ad (Click = 1, No Click = 0)

  • Defective product in a factory (Defective = 1, Not defective = 0)

Why is it important?

  • It’s the building block for many other distributions, like the Binomial distribution, which models the number of successes in multiple independent Bernoulli trials.

  • It’s used in binary classification, machine learning, economics, quality control, and more.

Example

Let \(X \sim \text{Bernoulli}(p)\). We’ll use different values of \(p\) (probability of success) in each example.

Example 1: Tossing a fair coin

Head = 1 (success) and Tail = 0 (failure). So, \(p = 0.5\)

Questions: What is \(P(X = 1)\)?

Solution: \(P(X = 1) = p = 0.5\)


Questions: What is \(P(X = 0)\)?

Solution: \(P(X = 0) = 1 - p = 0.5\)


Questions: What is the expected value?

Solution: \(\mathbb{E}[X] = p = 0.5\)

Example 2: Quality control in a factory

A machine produces parts. Probability that a part is defective is 0.1.

Let \(X = 1\) if defective, \(X = 0\) if not.

Questions: What is the probability a part is defective?

Solution: \(P(X = 1) = p = 0.1\)

Questions: What is the variance of this distribution?

Solution: \(\text{Var}(X) = p(1 - p) = 0.1 \times 0.9 = 0.09\)

Example 3: Clicking on an online ad

Probability a user clicks on an ad is 0.25.

\(X = 1\) if clicked, \(X = 0\) if not

Questions: What is \(P(X = 1)\)?

Solution: \(P(X = 1) = p = 0.25\)

Questions: What is \(P(X = 0)\)?

Solution: \(P(X = 0) = 1 - p = 0.75\)

Questions: What is the standard deviation?

Solution: \(\text{SD}(X) = \sqrt{p(1 - p)} = \sqrt{0.25 \times 0.75} = \sqrt{0.1875} \approx 0.433\)

Binomial distribution

The Binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two outcomes: success or failure.

Definition:

If a random variable \(X \sim \text{Binomial}(n, p)\), then:

  • \(n\): number of trials

  • \(p\): probability of success in each trial

  • \(X\): number of successes in \(n\) trials

  • \(X \in \{0, 1, 2, \ldots, n\}\)

Probability Mass Function (PMF):

\[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]

Where:

  • \(\binom{n}{k} = \dfrac{n!}{k!(n - k)!}\)

  • \(k\): number of successes

  • \(p\): probability of success

  • \((1 - p)\): probability of failure

Mean and Variance:

  • Mean (Expected value):

\[\mathbb{E}[X] = np\]

  • Variance:

\[\text{Var}(X) = np(1 - p)\]

Examples in Real Life:

Situation Trial Success
Tossing 10 coins Each toss Head
Surveying 20 people Each person Likes product
Quality control of 100 items Each item Not defective

When to Use Binomial Distribution:

  • The number of trials \(n\) is fixed

  • Each trial has two possible outcomes: success or failure

  • The probability of success \(p\) is the same in each trial

Example

Example 1: Tossing a fair coin 5 times

You toss a fair coin 5 times. What is the probability of getting exactly 3 heads?

Let \(X \sim \text{Binomial}(n = 5, p = 0.5)\)

Step-by-step:

  • \(n = 5\), \(k = 3\), \(p = 0.5\)

\[\begin{aligned} P(X = 3) &= \binom{5}{3}(0.5)^3(1 - 0.5)^{5 - 3}\\ &= \frac{5!}{3!2!}(0.5)^3(0.5)^2\\ &= 10 \times 0.125 \times 0.25 = 0.3125 \end{aligned} \]

Answer: \(P(X = 3) = 0.3125\)

Example 2: Defective products in a batch

A machine produces items with a 10% defect rate. If you check 8 items, what’s the probability exactly 2 are defective? Let \(X \sim \text{Binomial}(n = 8, p = 0.1)\)

Step-by-step:

  • \(n = 8\), \(k = 2\), \(p = 0.1\)

\[\begin{aligned} P(X = 2) &= \binom{8}{2}(0.1)^2(0.9)^6\\ &= \frac{8!}{2!6!}(0.01)(0.531441)\\ &= 28 \times 0.01 \times 0.531441 = 0.1488 \end{aligned}\]

Answer: \(P(X = 2) \approx 0.1488\)

Example 3: Online ad clicks

Each person who sees an ad has a 25% chance of clicking it. Out of 12 viewers, what’s the probability that exactly 4 click the ad?

Let \(X \sim \text{Binomial}(n = 12, p = 0.25)\)

Step-by-step:

  • \(n = 12\), \(k = 4\), \(p = 0.25\)

\[\begin{aligned} P(X = 4) &= \binom{12}{4}(0.25)^4(0.75)^8\\ &= 495 \times 0.00390625 \times 0.100112915\\ &= 495 \times 0.000390625 \approx 0.1937 \end{aligned}\]

Answer: \(P(X = 4) \approx 0.1937\)

Calculating the Probability in a Binomial Distribution

Poisson distribution

The Poisson distribution is a discrete probability distribution that models the number of events occurring in a fixed interval of time or space, under the assumption that:

  1. Events occur independently

  2. The average rate of occurrence \(\lambda\) is constant

  3. Two events cannot occur at the exact same instant

Definition:

If a random variable \(X \sim \text{Poisson}(\lambda)\), then it describes the probability of observing exactly \(k\) events in a fixed interval.

  • \(\lambda\): average number of events per interval (e.g., per hour, per day, per km², etc.)

  • \(X\): number of observed events

  • \(X \in \{0, 1, 2, \ldots\}\)

Probability Mass Function (PMF): \[P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}\]

Where:

  • \(e \approx 2.71828\) (Euler’s number)

  • \(k\): number of events (0, 1, 2, …)

  • \(\lambda\): average event rate

Mean and Variance:

  • \(\mathbb{E}[X] = \lambda\)

  • \(\text{Var}(X) = \lambda\)

When to Use Poisson Distribution:

  • Counting rare events over time or space

  • Events are random and independent

  • The rate is stable over time

Real-life Examples:

Situation Poisson variable
Calls arriving at a call center per hour Number of calls
Typos per page in a book Number of typos
Patients arriving at an ER per night Number of patients
Emails received per day Number of emails

Example

Example 1: Call center

A call center receives an average of 4 calls per hour. Find:

  1. \(P(X = 2)\): exactly 2 calls

\[P(X = 2) = \frac{e^{-4} \cdot 4^2}{2!} = \frac{e^{-4} \cdot 16}{2} = 8 \cdot e^{-4} \approx 8 \cdot 0.0183 = 0.1465\]

  1. \(P(X \leq 2)\): no more than 2 calls

\(P(X \leq 2) = P(0) + P(1) + P(2)\) \[\begin{aligned} P(0) &= \frac{e^{-4} \cdot 4^0}{0!} = e^{-4} = 0.0183 \\ P(1) &= \frac{e^{-4} \cdot 4^1}{1!} = 4 \cdot e^{-4} = 0.0733 \\ P(2) &= 0.1465 \ \text{(from part a)} \\ P(X \leq 2) &= 0.0183 + 0.0733 + 0.1465 = 0.2381 \end{aligned}\]

  1. \(P(X \geq 3)\): 3 or more calls

\(P(X \geq 3) = 1 - P(X \leq 2)\)

\[P(X \geq 3) = 1 - 0.2381 = 0.7619\]

Example 2: Hospital ER

An average of 3 patients arrive at the emergency room each night. Find:

  1. \(P(X = 5)\): exactly 5 patients

\[P(X = 5) = \frac{e^{-3} \cdot 3^5}{5!} = \frac{e^{-3} \cdot 243}{120} \approx 0.0498 \cdot 2.025 = 0.1008\]

  1. \(P(X \leq 5)\): at most 5 patients

\(P(X \leq 5) = \sum_{k=0}^{5} P(k)\) \[\begin{aligned} P(0) &= e^{-3} = 0.0498 \\ P(1) &= 3 \cdot e^{-3} = 0.1494 \\ P(2) &= \frac{9}{2} e^{-3} = 0.2240 \\ P(3) &= \frac{27}{6} e^{-3} = 0.2240 \\ P(4) &= \frac{81}{24} e^{-3} = 0.1680 \\ P(5) &= 0.1008 \\ P(X \leq 5) &= 0.0498 + 0.1494 + 0.2240\\ &~~~+ 0.2240 + 0.1680 + 0.1008 = 0.9160 \end{aligned}\]

  1. \(P(X \geq 2)\): at least 2 patients

\(P(X \geq 2) = 1 - P(0) - P(1)\) \[P(X \geq 2) = 1 - (0.0498 + 0.1494) = 1 - 0.1992 = 0.8008\]

Calculating the Probability in a Poisson Distribution

Important Note

In Jamovi, you can install and use the external module called distrACTION to calculate probabilities for both Binomial and Poisson distributions.

distrACTION Module

distrACTION Module

Properties of a Continuous Probability Distribution

Let \(f(x)\) be a Probability Density Function (PDF). It must satisfy the following conditions:

  1. \(f(x) \geq 0\) for all \(x\)

  2. \(\int_{-\infty}^{\infty} f(x) \, dx = 1\)

The probability that \(X\) falls within the interval \(a \leq X \leq b\) is given by:

\[\begin{aligned} P(a < X < b) &= P(a \leq X < b) \\ &= P(a < X \leq b) \\ &= P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx \end{aligned}\]

Key Example

Normal Distribution

\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}, \quad \mu \in \mathbb{R},\ \sigma^2 > 0,\ x \in \mathbb{R} \]

  • \(\mu\) is the mean

  • \(\sigma^2\) is the variance

  • The shape is a bell curve (symmetrical)

Calculating the Probability in a Normal Distribution

Important Continuous Probability Distributions

  1. Normal Distribution:
    Commonly used in statistics

  2. Uniform Distribution:
    All values within a given interval have equal probability

  3. Exponential Distribution:
    Often used for modeling waiting times

Statistics

Statistics is the science of collecting, analyzing, interpreting, and presenting data to support decision-making or to better understand phenomena.

There are two main branches of statistics

1. Descriptive Statistics

Used to summarize and describe data, such as:

  • Mean

  • Median

  • Standard Deviation

  • Variance

  • Pearson Correlation

  • Frequency Table

  • Various types of charts and graphs (Previous chapter)

Mean or Average

Definition:

The mean, also known as the average, is a measure of central tendency that represents the typical value in a set of numbers.

\[\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}\]

\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\]

Where:

  • \(\bar{x}\) = mean
  • \(x_i\) = each individual value
  • \(n\) = total number of values

How to Use the Mean

  1. Summarizing Data

    • It gives a single value that represents the entire dataset.

    • Example: The average height of students in a class.

  2. Comparing Groups

  • You can compare the mean scores of two different classes or products.
  1. Used in Other Statistical Methods
  • Mean is the basis for calculating:

    • Variance and standard deviation

    • Z-scores

    • Regression analysis

    • Hypothesis testing

Example

Let’s say you have the following exam scores: 70, 80, 90, 85, 75

\[ \text{Mean} = \frac{70 + 80 + 90 + 85 + 75}{5} = \frac{400}{5} = 80 \]

So, the average score is 80.

When Not to Use the Mean

  • Skewed data or outliers can distort the mean. In those cases, use the median (middle value) instead.

Median

Definition:

The median is the middle value of a dataset when the values are arranged in order. It divides the dataset into two equal halves.

  • If the number of data points is odd, the median is the middle number.

  • If the number is even, the median is the average of the two middle numbers.

How to Calculate the Median

  1. Sort the data from smallest to largest.

  2. Find the middle value:

    • If \(n\) is odd:

\[\text{Median} = x_{(\frac{n+1}{2})}\]

 * If $n$ is even:

\[\text{Median} = \frac{x_{(n/2)} + x_{(n/2 + 1)}}{2}\]

How to Use the Median

  1. Measure Central Tendency

    • It shows the “typical” value, especially for skewed data.
  2. When Data Has Outliers

    • Unlike the mean, the median is not affected by extreme values.
  3. Descriptive Statistics

    • Used in summarizing income, house prices, ages, etc.

Example 1: Odd number of values

  • Data: 5, 7, 9

  • Sorted: 5, 7, 9

  • Median = 7 (the middle value)

Example 2: Even number of values

  • Data: 3, 5, 7, 9

  • Sorted: 3, 5, 7, 9

  • Median = (5 + 7) / 2 = 6

Variance

Definition:

Variance measures how much the values in a dataset differ from the mean. It tells us the spread or dispersion of the data.

  • A small variance means the data points are close to the mean.

  • A large variance means the data points are spread out over a wider range.

Formula

For a sample: \[s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2\] For a population: \[\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2\]

Where:

  • \(x_i\) = each data point

  • \(\bar{x}\) = sample mean

  • \(\mu\) = population mean

  • \(n\), \(N\) = number of values in the sample or population

  • \(s^2\), \(\sigma^2\) = variance

Example

Data: 4, 6, 8

Mean = (4 + 6 + 8) / 3 = 6

Deviations: -2, 0, +2

Squared deviations: 4, 0, 4

Variance (sample) = \(\frac{4 + 0 + 4}{3 - 1} = \frac{8}{2} = 4\)

How to Use Variance

  1. Understand data spread

    • Helps measure how consistent or variable the data is.
  2. Compare variability

    • Useful when comparing the performance or risk of different datasets or investments.
  3. In statistics and machine learning. Variance is used in:

    • Standard deviation (√variance)

    • ANOVA

    • Regression analysis

    • Risk models in finance (Volatility)

Units of Variance

  • The unit of variance is the square of the original unit (e.g., if values are in meters, variance is in meters²).

  • That’s why standard deviation (the square root of variance) is often preferred for interpretation.

Standard Deviation

Definition:

The standard deviation is a measure of how spread out the values in a dataset are from the mean. It is the square root of the variance.

  • A low standard deviation means the data points are close to the mean.

  • A high standard deviation means the data points are more spread out.

Formula

For a sample: \[ s = \sqrt{\frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2} \] For a population: \[ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2} \]

Where:

  • \(x_i\) = each data value

  • \(\bar{x}\) = sample mean

  • \(\mu\) = population mean

  • \(s\), \(\sigma\) = standard deviation

  • \(n\), \(N\) = number of data points

Example

Data: 4, 6, 8

Mean = 6

Sample variance = \(\frac{(4-6)^2 + (6-6)^2 + (8-6)^2}{3 - 1} = 4\)

Standard deviation = \(\sqrt{4} = 2\)

How to Use Standard Deviation

  1. Describe variability

    • Shows how consistently data points are clustered around the mean.
  2. Compare consistency

    • Smaller standard deviation = more consistent results (e.g., test scores, product quality).
  3. In statistical analysis Used in:

    • Confidence intervals

    • Hypothesis testing (e.g., z-tests, t-tests)

    • Control charts in quality control

    • Risk assessment in finance

Units of Standard Deviation

  • Same unit as the original data (e.g., if data is in cm, standard deviation is in cm).

  • This makes it more interpretable than variance.

Pearson Correlation

Definition:

The Pearson correlation coefficient (denoted as \(r\)) measures the strength and direction of the linear relationship between two numerical variables.

  • Values range from –1 to +1.

Formula:

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2} \cdot \sqrt{\sum (y_i - \bar{y})^2}} \]

Where:

  • \(x_i, y_i\) = values of the two variables

  • \(\bar{x}, \bar{y}\) = means of each variable

Example:

Imagine you collect data on students’ study time (hours) and exam scores:

Study Time (X) Exam Score (Y)
1 50
2 60
3 70
4 80
5 90

The Pearson correlation would be +1, showing a perfect positive linear relationship.

Interpretation of \(r\):

\(r\) value Interpretation
\(+1\) Perfect positive linear correlation
\(0.7\) to \(0.9\) Strong positive linear correlation
\(0.3\) to \(0.7\) Moderate positive linear correlation
\(0\) No linear correlation
\(-0.3\) to \(-0.7\) Moderate negative linear correlation
\(-0.7\) to \(-0.9\) Strong negative linear correlation
\(-1\) Perfect negative linear correlation

How to Use Pearson Correlation

  1. Measure Relationships

    • To assess how strongly two variables are related (e.g., height and weight, income and spending).
  2. Feature Selection

    • In machine learning, to eliminate highly correlated predictors (multicollinearity).
  3. Hypothesis Testing

    • You can test whether the correlation is significantly different from zero using a t-test.

When Not to Use Pearson Correlation

  • When the relationship is nonlinear.

  • When the data is not normally distributed.

  • When variables are ordinal or categorical (use Spearman or Kendall’s correlation instead).

2. Inferential Statistics

Used to analyze data in order to draw conclusions or make predictions about a population, based on a sample.

  • Hypothesis Testing

  • Parameter Estimation

  • Regression Analysis

Applications of Statistics

1. Business and Marketing

  • Analyze market trends and customer behavior

  • Forecast product sales using Time Series Analysis

  • Use A/B Testing to compare the effectiveness of advertisements or marketing campaigns

2. Economics and Finance

  • Analyze economic conditions, such as calculating inflation and unemployment rates

  • Assess risk and return in investment portfolios (Portfolio Analysis)

  • Use econometric models to study factors influencing the economy

3. Science and Engineering

  • Design experiments (Design of Experiments) to develop new products

  • Analyze data from experiments in physics, chemistry, and biology

  • Perform quality control using Statistical Quality Control (SQC)

4. Medicine and Public Health

  • Analyze the effects of drugs or vaccines using Biostatistics

  • Study disease risks through Epidemiological data analysis

  • Use Machine Learning and AI to analyze medical records and assist in diagnosis

5. Data Science and Artificial Intelligence (AI)

  • Analyze Big Data to gain insights for data-driven decision making

  • Use Machine Learning techniques to develop predictive models

  • Perform Text Mining and analyze social media data

6. Education and Research

  • Analyze students’ academic performance and evaluate the effectiveness of curricula

  • Use statistics to design research studies that yield reliable conclusions

  • Analyze experimental data to test scientific hypotheses

Mean-Variance Criteria

This refers to a decision-making framework used in finance, economics, and statistics, especially in portfolio selection and investment analysis. It is based on the ideas introduced by Harry Markowitz in his Modern Portfolio Theory (MPT).

Definition:

The Mean-Variance Criteria evaluates and compares alternatives (such as portfolios, investment strategies, or decisions) based on two key factors:

  • Mean = expected return (reward)

  • Variance = risk (volatility of return)

We prefer a high average and a low variance.

Decision Rule (Mean-Variance Dominance):

Given two choices A and B:

  • A is preferred over B if:

    • \(\mu_A \geq \mu_B\) and
    • \(\sigma^2_A \leq \sigma^2_B\),
    • with at least one strict inequality.

This means that A has higher or equal return and lower or equal risk than B.

In another case, it means we are unable to make a decision.

Example

Option Mean (Return) Variance (Risk)
A 8% 4
B 7% 5
C 9% 6
  • A dominates B (higher return and lower risk) → eliminate B

  • A vs. C: A = safer, C = more profitable → Choice depends on risk tolerance

Mean Variance Criteria Example

Normalized and Standardized Data

What is Normalized Data?

Definition: Normalization is the process of rescaling data to fit within a specific range, often 0 to 1 (or sometimes -1 to 1).

It changes the scale of the data but does not change its shape.

Common Formula (Min–Max Scaling)

\[x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}\]

Where:

  • \(x\) = original value
  • \(x_{\min}\), \(x_{\max}\) = min and max of the dataset
  • \(x'\) = normalized value (between 0 and 1)

When to Use Normalization

  • When you want all features to have equal importance in models (e.g., k-NN, neural networks).

  • When data is not normally distributed.

  • When different features have different units/scales (e.g., height in cm, weight in kg).

What is Standardized Data?

Definition: Standardization transforms data to have:

  • Mean(\(\bar{x}\)) = 0

  • Standard deviation(SD or \(\sigma\)) = 1

This is also called Z-score normalization.

Formula

\[z = \frac{x - \mu}{\sigma}\]

Where:

  • \(\mu\) = mean of the data

  • \(\sigma\) = standard deviation of the data

  • \(z\) = standardized value

When to Use Standardization

  • When data is normally distributed (or close to it).

  • When algorithms assume data is centered (e.g., PCA, linear regression, logistic regression, SVM).

Calculating the Normalized and Standardized Data

References

  • Devore, J. L. (2019). Probability and statistics for engineering and the sciences (9th ed.). Cengage Learning.

  • Ross, S. M. (2020). Introduction to probability and statistics for engineers and scientists (6th ed.). Academic Press.

  • Montgomery, D. C., & Runger, G. C. (2021). Applied statistics and probability for engineers (7th ed.). Wiley.

  • Rice, J. A. (2006). Mathematical statistics and data analysis (3rd ed.). Cengage Learning.

  • Wasserman, L. (2004). All of statistics: A concise course in statistical inference. Springer.