International College of Digital Innovation, CMU
August 17, 2025
Axioms of Probability are three fundamental rules that define the properties of probability.
They were proposed by Andrey Kolmogorov in 1933 and form the foundation of modern probability theory.
Kolmogorov’s Axioms:
Let \(S\) be a sample space and let \(A\) be any event that is a subset of \(S\). The three axioms are as follows:

Sample space \(S\): All 52 unique cards \[ S = \{\text{Ace of Hearts}, \text{2 of Hearts},\\ \ldots, \text{King of Spades}\} \] Event \(A\): Drawing a red card
\[A = \{\text{all Hearts and Diamonds}\\ \text{(26 cards)}\}\]
Axiom 1 Probability must be non-negative
\[P(A) \geq 0\]
For every event \(A\), this means the probability must always be a non-negative value—either positive or at least zero.
Axiom 2 The probability of the sample space is 1
\[P(S) = 1\]
This means the probability of the event that covers all possible outcomes must equal 1.
Axiom 3 Additivity for mutually exclusive events
If \(A\) and \(B\) are mutually exclusive events, meaning they have no outcomes in common (i.e., \(A \cap B = \emptyset\)), then
\[
P(A \cup B) = P(A) + P(B)
\]
This means if two events cannot occur at the same time, the probability of either one occurring is the sum of their individual probabilities.
From the three axioms, we can infer other important properties. For example:
The probability of an impossible event is zero
\[P(\emptyset) = 0\]
An impossible event has a probability of zero because it cannot occur.
Complement Rule
\[P(A^c) = 1 - P(A)\]
This means that if the probability of event \(A\) is \(P(A)\), then the probability of the event not \(A\) is \[1−P(A)1−P(A)\]
General Addition Rule of Probability
For any events \(A\) and \(B\):
\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
This formula applies even when the events may overlap.
Rolling a single die
Let \(S = \{1, 2, 3, 4, 5, 6\}\)
Using the addition rule: \[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
\[= 0.5 + 0.333 - 0.167 = 0.666\]
A random variable is a variable that represents the outcome of a random experiment.
Its value is determined by chance or probability.
Random variables are commonly used in statistics and probability theory to describe probability distributions of data.
There are two main types of random variables:
Discrete Random Variable
Continuous Random Variable
Takes on a countable number of possible values
Commonly used in events where outcomes can be counted, such as the number rolled on a die or the number of correct answers on a test
Examples
Rolling a die: Let \(X\) be the value shown on the die → \(X = \{1, 2, 3, 4, 5, 6\}\)
Flipping a coin: Let \(Y\) be the number of heads when flipping a coin 3 times → \(Y = \{0, 1, 2, 3\}\)
Number of customers per day: Let \(X\) be the number of customers arriving each day → \(X = 0, 1, 2, 3, 4, \cdots\)
A variable that can take on any value within a range of real numbers
Used for measurable quantities such as weight, height, or time
Examples
Customer service time: Variable \(T\) may take values between 0 and 10 minutes
Temperature in a city: Variable \(Z\) may range from 25°C to 35°C
Investment return rate: Variable \(r \in (-100\%, \infty)\)
When we can define a specific functional form for the distribution, it is called a Probability Distribution
A probability distribution describes how often each value of a random variable is expected to occur or its likelihood.
Properties of a Discrete Probability Distribution
Let \(X\) be a random variable and \(P(X)\) be the probability of each possible value of \(X\). It must satisfy the following conditions:
\(0 \leq P(X) \leq 1\) for all values of \(X\)
\(\sum P(X) = 1\) (The total probability must sum to 1)
Rolling a Die
Let the random variable \(X\) represent the number shown on a single six-sided die (\(X = 1, 2, 3, 4, 5, 6\))
\[P(X) = \begin{cases} \frac{1}{6}, & X = 1, 2, 3, 4, 5, 6 \\ 0, & \text{otherwise} \end{cases}\]
Question 1:
What is the probability that the number rolled is less than 4?
\[P(X < 4) = P(1) + P(2) + P(3) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = 0.5\]
Question 2:
What is the probability that the number rolled is an even number?
Even numbers on a die: 2, 4, 6
\[ P(\text{even}) = P(2) + P(4) + P(6) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = 0.5 \]
Question 3:
What is the probability that the number rolled is greater than or equal to 5?
Numbers: 5, 6
\[P(X \geq 5) = P(5) + P(6) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3} \approx 0.333\]
Important Discrete Probability Distributions
Bernoulli Distribution:
Used for events with only two possible outcomes, such as success/failure
Binomial Distribution:
Models multiple independent trials where each trial has two outcomes
Poisson Distribution:
Used to model the number of events occurring within a fixed interval of time or space
Definition:
The Bernoulli distribution is a discrete probability distribution for a random variable which has only two possible outcomes:
Success (usually coded as 1)
Failure (usually coded as 0)
It models the outcome of a single trial (or experiment) that can result in only one of two outcomes.
Mathematical Definition:
Let \(X \sim \text{Bernoulli}(p)\), where
\(X \in \{0, 1\}\)
\(p\) is the probability of success (i.e., \(P(X = 1) = p\))
\(1 - p\) is the probability of failure (i.e., \(P(X = 0) = 1 - p\))
\(0 \leq p \leq 1\)
Probability mass function (PMF):
\[ P(X = x) = p^x (1 - p)^{1 - x}, \quad \text{for } x \in \{0, 1\} \]
Properties:
Mean: \(\mathbb{E}[X] = p\)
Variance: \(\text{Var}(X) = p(1 - p)\)
Examples:
Tossing a coin (Heads = 1, Tails = 0)
Passing a test (Pass = 1, Fail = 0)
Clicking on an ad (Click = 1, No Click = 0)
Defective product in a factory (Defective = 1, Not defective = 0)
Why is it important?
It’s the building block for many other distributions, like the Binomial distribution, which models the number of successes in multiple independent Bernoulli trials.
It’s used in binary classification, machine learning, economics, quality control, and more.
Let \(X \sim \text{Bernoulli}(p)\). We’ll use different values of \(p\) (probability of success) in each example.
Example 1: Tossing a fair coin
Head = 1 (success) and Tail = 0 (failure). So, \(p = 0.5\)
Questions: What is \(P(X = 1)\)?
Solution: \(P(X = 1) = p = 0.5\)
Questions: What is \(P(X = 0)\)?
Solution: \(P(X = 0) = 1 - p = 0.5\)
Questions: What is the expected value?
Solution: \(\mathbb{E}[X] = p = 0.5\)
Example 2: Quality control in a factory
A machine produces parts. Probability that a part is defective is 0.1.
Let \(X = 1\) if defective, \(X = 0\) if not.
Questions: What is the probability a part is defective?
Solution: \(P(X = 1) = p = 0.1\)
Questions: What is the variance of this distribution?
Solution: \(\text{Var}(X) = p(1 - p) = 0.1 \times 0.9 = 0.09\)
Example 3: Clicking on an online ad
Probability a user clicks on an ad is 0.25.
\(X = 1\) if clicked, \(X = 0\) if not
Questions: What is \(P(X = 1)\)?
Solution: \(P(X = 1) = p = 0.25\)
Questions: What is \(P(X = 0)\)?
Solution: \(P(X = 0) = 1 - p = 0.75\)
Questions: What is the standard deviation?
Solution: \(\text{SD}(X) = \sqrt{p(1 - p)} = \sqrt{0.25 \times 0.75} = \sqrt{0.1875} \approx 0.433\)
The Binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two outcomes: success or failure.
Definition:
If a random variable \(X \sim \text{Binomial}(n, p)\), then:
\(n\): number of trials
\(p\): probability of success in each trial
\(X\): number of successes in \(n\) trials
\(X \in \{0, 1, 2, \ldots, n\}\)
Probability Mass Function (PMF):
\[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]
Where:
\(\binom{n}{k} = \dfrac{n!}{k!(n - k)!}\)
\(k\): number of successes
\(p\): probability of success
\((1 - p)\): probability of failure
Mean and Variance:
\[\mathbb{E}[X] = np\]
\[\text{Var}(X) = np(1 - p)\]
Examples in Real Life:
| Situation | Trial | Success |
|---|---|---|
| Tossing 10 coins | Each toss | Head |
| Surveying 20 people | Each person | Likes product |
| Quality control of 100 items | Each item | Not defective |
When to Use Binomial Distribution:
The number of trials \(n\) is fixed
Each trial has two possible outcomes: success or failure
The probability of success \(p\) is the same in each trial
Example 1: Tossing a fair coin 5 times
You toss a fair coin 5 times. What is the probability of getting exactly 3 heads?
Let \(X \sim \text{Binomial}(n = 5, p = 0.5)\)
Step-by-step:
\[\begin{aligned} P(X = 3) &= \binom{5}{3}(0.5)^3(1 - 0.5)^{5 - 3}\\ &= \frac{5!}{3!2!}(0.5)^3(0.5)^2\\ &= 10 \times 0.125 \times 0.25 = 0.3125 \end{aligned} \]
Answer: \(P(X = 3) = 0.3125\)
Example 2: Defective products in a batch
A machine produces items with a 10% defect rate. If you check 8 items, what’s the probability exactly 2 are defective? Let \(X \sim \text{Binomial}(n = 8, p = 0.1)\)
Step-by-step:
\[\begin{aligned} P(X = 2) &= \binom{8}{2}(0.1)^2(0.9)^6\\ &= \frac{8!}{2!6!}(0.01)(0.531441)\\ &= 28 \times 0.01 \times 0.531441 = 0.1488 \end{aligned}\]
Answer: \(P(X = 2) \approx 0.1488\)
Example 3: Online ad clicks
Each person who sees an ad has a 25% chance of clicking it. Out of 12 viewers, what’s the probability that exactly 4 click the ad?
Let \(X \sim \text{Binomial}(n = 12, p = 0.25)\)
Step-by-step:
\[\begin{aligned} P(X = 4) &= \binom{12}{4}(0.25)^4(0.75)^8\\ &= 495 \times 0.00390625 \times 0.100112915\\ &= 495 \times 0.000390625 \approx 0.1937 \end{aligned}\]
Answer: \(P(X = 4) \approx 0.1937\)
viewof binom_n = Inputs.range([1, 100], { step: 1, value: 20, label: "Number of Trials (n)" })
// Probability of success (p)
viewof binom_p = Inputs.range([0.01, 1], { step: 0.01, value: 0.5, label: "Probability of Success (p)" })
// a1 and a2 for the x values
viewof binom_a1 = Inputs.number({ label: "a₁ (lower bound)", value: 5 })
viewof binom_a2 = Inputs.number({ label: "a₂ (upper bound, a₂ ≥ a₁)", value: 10 })
// Select probability type
viewof binom_probType = Inputs.select(
["P(x = a₁)", "P(x < a₁)", "P(x ≤ a₁)", "P(x > a₁)", "P(x ≥ a₁)", "P(a₁ ≤ x ≤ a₂)"],
{ label: "Choose Probability Type" }
)The Poisson distribution is a discrete probability distribution that models the number of events occurring in a fixed interval of time or space, under the assumption that:
Events occur independently
The average rate of occurrence \(\lambda\) is constant
Two events cannot occur at the exact same instant
Definition:
If a random variable \(X \sim \text{Poisson}(\lambda)\), then it describes the probability of observing exactly \(k\) events in a fixed interval.
\(\lambda\): average number of events per interval (e.g., per hour, per day, per km², etc.)
\(X\): number of observed events
\(X \in \{0, 1, 2, \ldots\}\)
Probability Mass Function (PMF): \[P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}\]
Where:
\(e \approx 2.71828\) (Euler’s number)
\(k\): number of events (0, 1, 2, …)
\(\lambda\): average event rate
Mean and Variance:
\(\mathbb{E}[X] = \lambda\)
\(\text{Var}(X) = \lambda\)
When to Use Poisson Distribution:
Counting rare events over time or space
Events are random and independent
The rate is stable over time
Real-life Examples:
| Situation | Poisson variable |
|---|---|
| Calls arriving at a call center per hour | Number of calls |
| Typos per page in a book | Number of typos |
| Patients arriving at an ER per night | Number of patients |
| Emails received per day | Number of emails |
Example 1: Call center
A call center receives an average of 4 calls per hour. Find:
\[P(X = 2) = \frac{e^{-4} \cdot 4^2}{2!} = \frac{e^{-4} \cdot 16}{2} = 8 \cdot e^{-4} \approx 8 \cdot 0.0183 = 0.1465\]
\(P(X \leq 2) = P(0) + P(1) + P(2)\) \[\begin{aligned} P(0) &= \frac{e^{-4} \cdot 4^0}{0!} = e^{-4} = 0.0183 \\ P(1) &= \frac{e^{-4} \cdot 4^1}{1!} = 4 \cdot e^{-4} = 0.0733 \\ P(2) &= 0.1465 \ \text{(from part a)} \\ P(X \leq 2) &= 0.0183 + 0.0733 + 0.1465 = 0.2381 \end{aligned}\]
\(P(X \geq 3) = 1 - P(X \leq 2)\)
\[P(X \geq 3) = 1 - 0.2381 = 0.7619\]
Example 2: Hospital ER
An average of 3 patients arrive at the emergency room each night. Find:
\[P(X = 5) = \frac{e^{-3} \cdot 3^5}{5!} = \frac{e^{-3} \cdot 243}{120} \approx 0.0498 \cdot 2.025 = 0.1008\]
\(P(X \leq 5) = \sum_{k=0}^{5} P(k)\) \[\begin{aligned} P(0) &= e^{-3} = 0.0498 \\ P(1) &= 3 \cdot e^{-3} = 0.1494 \\ P(2) &= \frac{9}{2} e^{-3} = 0.2240 \\ P(3) &= \frac{27}{6} e^{-3} = 0.2240 \\ P(4) &= \frac{81}{24} e^{-3} = 0.1680 \\ P(5) &= 0.1008 \\ P(X \leq 5) &= 0.0498 + 0.1494 + 0.2240\\ &~~~+ 0.2240 + 0.1680 + 0.1008 = 0.9160 \end{aligned}\]
\(P(X \geq 2) = 1 - P(0) - P(1)\) \[P(X \geq 2) = 1 - (0.0498 + 0.1494) = 1 - 0.1992 = 0.8008\]
viewof pois_lambda = Inputs.range([1, 20], { step: 1, value: 5, label: "Rate (λ)" })
// Input for a1 and a2
viewof pois_a1 = Inputs.number({ label: "a₁ (lower bound)", value: 3 })
viewof pois_a2 = Inputs.number({ label: "a₂ (upper bound, a₂ > a₁)", value: 7 })
// Selection of probability type
viewof pois_probType = Inputs.select(
["P(x = a₁)", "P(x < a₁)", "P(x ≤ a₁)", "P(x > a₁)", "P(x ≥ a₁)", "P(a₁ ≤ x ≤ a₂)"],
{ label: "Choose Probability Type" }
)In Jamovi, you can install and use the external module called distrACTION to calculate probabilities for both Binomial and Poisson distributions.
Properties of a Continuous Probability Distribution
Let \(f(x)\) be a Probability Density Function (PDF). It must satisfy the following conditions:
\(f(x) \geq 0\) for all \(x\)
\(\int_{-\infty}^{\infty} f(x) \, dx = 1\)
The probability that \(X\) falls within the interval \(a \leq X \leq b\) is given by:
\[\begin{aligned} P(a < X < b) &= P(a \leq X < b) \\ &= P(a < X \leq b) \\ &= P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx \end{aligned}\]
Normal Distribution
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}, \quad \mu \in \mathbb{R},\ \sigma^2 > 0,\ x \in \mathbb{R} \]
\(\mu\) is the mean
\(\sigma^2\) is the variance
The shape is a bell curve (symmetrical)
import { Inputs, Plot } from "@observablehq/inputs"
// Input for mean (mu)
viewof mu = Inputs.range([-10, 10], { step: 0.1, value: 0, label: "Mean (μ)" })
// Input for standard deviation (sigma)
viewof sigma = Inputs.range([0.1, 10], { step: 0.1, value: 1, label: "Standard Deviation (σ)" })
// Input for a1 and a2
viewof a1 = Inputs.number({ label: "a₁", value: 0 })
viewof a2 = Inputs.number({ label: "a₂: (a₂ > a₁)", value: 1 })
// Selection of probability type
viewof probType = Inputs.select(["P(x < a₁)", "P(x > a₁)", "P(a₁ < x < a₂)"], { label: "Choose Probability Type" })Normal Distribution:
Commonly used in statistics
Uniform Distribution:
All values within a given interval have equal probability
Exponential Distribution:
Often used for modeling waiting times
Statistics is the science of collecting, analyzing, interpreting, and presenting data to support decision-making or to better understand phenomena.
There are two main branches of statistics
1. Descriptive Statistics
Used to summarize and describe data, such as:
Mean
Median
Standard Deviation
Variance
Pearson Correlation
Frequency Table
Various types of charts and graphs (Previous chapter)
Definition:
The mean, also known as the average, is a measure of central tendency that represents the typical value in a set of numbers.
\[\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}\]
\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\]
Where:
How to Use the Mean
Summarizing Data
It gives a single value that represents the entire dataset.
Example: The average height of students in a class.
Comparing Groups
Mean is the basis for calculating:
Variance and standard deviation
Z-scores
Regression analysis
Hypothesis testing
Example
Let’s say you have the following exam scores: 70, 80, 90, 85, 75
\[ \text{Mean} = \frac{70 + 80 + 90 + 85 + 75}{5} = \frac{400}{5} = 80 \]
So, the average score is 80.
When Not to Use the Mean
Definition:
The median is the middle value of a dataset when the values are arranged in order. It divides the dataset into two equal halves.
If the number of data points is odd, the median is the middle number.
If the number is even, the median is the average of the two middle numbers.
How to Calculate the Median
Sort the data from smallest to largest.
Find the middle value:
\[\text{Median} = x_{(\frac{n+1}{2})}\]
* If $n$ is even:
\[\text{Median} = \frac{x_{(n/2)} + x_{(n/2 + 1)}}{2}\]
How to Use the Median
Measure Central Tendency
When Data Has Outliers
Descriptive Statistics
Example 1: Odd number of values
Data: 5, 7, 9
Sorted: 5, 7, 9
Median = 7 (the middle value)
Example 2: Even number of values
Data: 3, 5, 7, 9
Sorted: 3, 5, 7, 9
Median = (5 + 7) / 2 = 6
Definition:
Variance measures how much the values in a dataset differ from the mean. It tells us the spread or dispersion of the data.
A small variance means the data points are close to the mean.
A large variance means the data points are spread out over a wider range.
Formula
For a sample: \[s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2\] For a population: \[\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2\]
Where:
\(x_i\) = each data point
\(\bar{x}\) = sample mean
\(\mu\) = population mean
\(n\), \(N\) = number of values in the sample or population
\(s^2\), \(\sigma^2\) = variance
Example
Data: 4, 6, 8
Mean = (4 + 6 + 8) / 3 = 6
Deviations: -2, 0, +2
Squared deviations: 4, 0, 4
Variance (sample) = \(\frac{4 + 0 + 4}{3 - 1} = \frac{8}{2} = 4\)
How to Use Variance
Understand data spread
Compare variability
In statistics and machine learning. Variance is used in:
Standard deviation (√variance)
ANOVA
Regression analysis
Risk models in finance (Volatility)
Units of Variance
The unit of variance is the square of the original unit (e.g., if values are in meters, variance is in meters²).
That’s why standard deviation (the square root of variance) is often preferred for interpretation.
Definition:
The standard deviation is a measure of how spread out the values in a dataset are from the mean. It is the square root of the variance.
A low standard deviation means the data points are close to the mean.
A high standard deviation means the data points are more spread out.
Formula
For a sample: \[ s = \sqrt{\frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2} \] For a population: \[ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2} \]
Where:
\(x_i\) = each data value
\(\bar{x}\) = sample mean
\(\mu\) = population mean
\(s\), \(\sigma\) = standard deviation
\(n\), \(N\) = number of data points
Example
Data: 4, 6, 8
Mean = 6
Sample variance = \(\frac{(4-6)^2 + (6-6)^2 + (8-6)^2}{3 - 1} = 4\)
Standard deviation = \(\sqrt{4} = 2\)
How to Use Standard Deviation
Describe variability
Compare consistency
In statistical analysis Used in:
Confidence intervals
Hypothesis testing (e.g., z-tests, t-tests)
Control charts in quality control
Risk assessment in finance
Units of Standard Deviation
Same unit as the original data (e.g., if data is in cm, standard deviation is in cm).
This makes it more interpretable than variance.
Definition:
The Pearson correlation coefficient (denoted as \(r\)) measures the strength and direction of the linear relationship between two numerical variables.
Formula:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2} \cdot \sqrt{\sum (y_i - \bar{y})^2}} \]
Where:
\(x_i, y_i\) = values of the two variables
\(\bar{x}, \bar{y}\) = means of each variable
Example:
Imagine you collect data on students’ study time (hours) and exam scores:
| Study Time (X) | Exam Score (Y) |
|---|---|
| 1 | 50 |
| 2 | 60 |
| 3 | 70 |
| 4 | 80 |
| 5 | 90 |
The Pearson correlation would be +1, showing a perfect positive linear relationship.
Interpretation of \(r\):
| \(r\) value | Interpretation |
|---|---|
| \(+1\) | Perfect positive linear correlation |
| \(0.7\) to \(0.9\) | Strong positive linear correlation |
| \(0.3\) to \(0.7\) | Moderate positive linear correlation |
| \(0\) | No linear correlation |
| \(-0.3\) to \(-0.7\) | Moderate negative linear correlation |
| \(-0.7\) to \(-0.9\) | Strong negative linear correlation |
| \(-1\) | Perfect negative linear correlation |
How to Use Pearson Correlation
Measure Relationships
Feature Selection
Hypothesis Testing
When Not to Use Pearson Correlation
When the relationship is nonlinear.
When the data is not normally distributed.
When variables are ordinal or categorical (use Spearman or Kendall’s correlation instead).
2. Inferential Statistics
Used to analyze data in order to draw conclusions or make predictions about a population, based on a sample.
Hypothesis Testing
Parameter Estimation
Regression Analysis
1. Business and Marketing
Analyze market trends and customer behavior
Forecast product sales using Time Series Analysis
Use A/B Testing to compare the effectiveness of advertisements or marketing campaigns
2. Economics and Finance
Analyze economic conditions, such as calculating inflation and unemployment rates
Assess risk and return in investment portfolios (Portfolio Analysis)
Use econometric models to study factors influencing the economy
3. Science and Engineering
Design experiments (Design of Experiments) to develop new products
Analyze data from experiments in physics, chemistry, and biology
Perform quality control using Statistical Quality Control (SQC)
4. Medicine and Public Health
Analyze the effects of drugs or vaccines using Biostatistics
Study disease risks through Epidemiological data analysis
Use Machine Learning and AI to analyze medical records and assist in diagnosis
5. Data Science and Artificial Intelligence (AI)
Analyze Big Data to gain insights for data-driven decision making
Use Machine Learning techniques to develop predictive models
Perform Text Mining and analyze social media data
6. Education and Research
Analyze students’ academic performance and evaluate the effectiveness of curricula
Use statistics to design research studies that yield reliable conclusions
Analyze experimental data to test scientific hypotheses
This refers to a decision-making framework used in finance, economics, and statistics, especially in portfolio selection and investment analysis. It is based on the ideas introduced by Harry Markowitz in his Modern Portfolio Theory (MPT).
Definition:
The Mean-Variance Criteria evaluates and compares alternatives (such as portfolios, investment strategies, or decisions) based on two key factors:
Mean = expected return (reward)
Variance = risk (volatility of return)
We prefer a high average and a low variance.
Given two choices A and B:
A is preferred over B if:
This means that A has higher or equal return and lower or equal risk than B.
In another case, it means we are unable to make a decision.
Example
| Option | Mean (Return) | Variance (Risk) |
|---|---|---|
| A | 8% | 4 |
| B | 7% | 5 |
| C | 9% | 6 |
A dominates B (higher return and lower risk) → eliminate B
A vs. C: A = safer, C = more profitable → Choice depends on risk tolerance
viewof N1 = Inputs.range([10, 30], { step: 1, value: 10, label: "N" })
viewof mu_a = Inputs.number({label: "Mean (μₐ)%", value: 5, step: 0.1})
viewof mu_b = Inputs.number({label: "Mean (μᵦ)%", value: 4, step: 0.1})
viewof var_a = Inputs.number({label: "Variance (σ²ₐ)", value: 3, step: 0.1})
viewof var_b = Inputs.number({label: "Variance (σ²ᵦ)", value: 5, step: 0.1})What is Normalized Data?
Definition: Normalization is the process of rescaling data to fit within a specific range, often 0 to 1 (or sometimes -1 to 1).
It changes the scale of the data but does not change its shape.
Common Formula (Min–Max Scaling)
\[x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}\]
Where:
When to Use Normalization
When you want all features to have equal importance in models (e.g., k-NN, neural networks).
When data is not normally distributed.
When different features have different units/scales (e.g., height in cm, weight in kg).
What is Standardized Data?
Definition: Standardization transforms data to have:
Mean(\(\bar{x}\)) = 0
Standard deviation(SD or \(\sigma\)) = 1
This is also called Z-score normalization.
Formula
\[z = \frac{x - \mu}{\sigma}\]
Where:
\(\mu\) = mean of the data
\(\sigma\) = standard deviation of the data
\(z\) = standardized value
When to Use Standardization
When data is normally distributed (or close to it).
When algorithms assume data is centered (e.g., PCA, linear regression, logistic regression, SVM).
viewof N22 = Inputs.range([10, 30], { step: 1, value: 20, label: "N (10–30)" })
// เลือกชนิดการสุ่มข้อมูล: จำนวนเต็ม หรือ ปกติ
viewof dist = Inputs.radio(["integer", "normal"], {label: "Distribution", value: "integer"})
// พารามิเตอร์ integer
viewof int_min = Inputs.number({label: "Integer min", value: 0, step: 1})
viewof int_max = Inputs.number({label: "Integer max", value: 100, step: 1})
// พารามิเตอร์ normal
viewof mu00 = Inputs.number({label: "Normal mean (μ)", value: 50, step: 0.1})
viewof sigma00 = Inputs.number({label: "Normal sd (σ)", value: 10, step: 0.1})
// ปุ่มจำลองข้อมูล
viewof simulate = Inputs.button("Simulate")
// ฟังก์ชันช่วยใน OJS
round2 = x => Math.round(x * 100) / 100
// Box–Muller สำหรับ N(0,1)
randn = () => {
let u = 0, v = 0
while (u === 0) u = Math.random()
while (v === 0) v = Math.random()
return Math.sqrt(-2 * Math.log(u)) * Math.cos(2 * Math.PI * v)
}
// original: ออกผลเป็นอาร์เรย์ JS (จะถูกส่งต่อให้ R ผ่าน #| input:)
original = {
simulate; // ให้รีรันเมื่อกดปุ่ม
if (dist === "integer") {
const lo = Math.min(int_min, int_max)
const hi = Math.max(int_min, int_max)
return Array.from({length: N22}, () =>
Math.floor(Math.random() * (hi - lo + 1)) + lo
)
} else {
return Array.from({length: N22}, () =>
round2(mu00 + sigma00 * randn())
)
}
}Devore, J. L. (2019). Probability and statistics for engineering and the sciences (9th ed.). Cengage Learning.
Ross, S. M. (2020). Introduction to probability and statistics for engineers and scientists (6th ed.). Academic Press.
Montgomery, D. C., & Runger, G. C. (2021). Applied statistics and probability for engineers (7th ed.). Wiley.
Rice, J. A. (2006). Mathematical statistics and data analysis (3rd ed.). Cengage Learning.
Wasserman, L. (2004). All of statistics: A concise course in statistical inference. Springer.