Univariate vs Multivariate – Normality Check
Univariate vs Multivariate – Normality Check.
Using the Q-Q Plot (Graphical Method)
For Univariate Data
Steps includes,
1.
Arrange the values in ascending order (Data >
Sort)
2.
Compute the ranks of the values in the dataset
(using RANK.EQ function)
Manually change the ranks of the set of
values having the same rank.
3.
Calculate the cumulative probability of the
ranks of the dataset, formula below.
Assigned or Fitted Cumulative Probability =
((i – 0.5)/n)
Where
n is the size (count) of the dataset, and i is the rank of data value.
4.
Calculate the normal predicted cumulative
probability for the dataset based on empirical or theoretical normal
distribution density function (using NORM.DIST function).
5.
Finally, plot a scatter plot with the Predicted
Normal Cumulative Probability on the x-axis and the Assigned Cumulative
Probability on the y-axis.
6.
If the resulting graph is approximately a
straight line, the dataset can be assumed to be normally distributed;
otherwise, the dataset is not normally distributed.
7.
One can also plot quantile vs sorted data set
and the resulting graph is expected to be a straight line, if the dataset can
be assumed to be normally distributed.
Reference
Test
for Normality (Normal Dist.) – Excel and Google Sheets - Automate Excel
Normality
Test Using Microsoft Excel : Intact Prolink Blog (inprolink.com) based on
chi-square test
Empirical
Distribution Function / Empirical CDF - Statistics How To
Excel
NORM.DIST function | Exceljet
Multivariate Data
Steps includes,
1.
Calculate Squared Mahalanobis distance (T-square)
for each observation in dataset.
2.
Sort the Squared Mahalanobis distance (T-square)
in ascending order.
3.
Compute the ranks of the values in the dataset
(using RANK.EQ function)
4.
Calculate the cumulative probability of the
ranks of the dataset (Assigned based on rank).
5.
Calculate the chi-square predicted cumulative
probability (as T-square is expected to follow Chi-square distribution).
6.
Finally, plot a scatter plot with the Predicted
Cumulative Probability on the x-axis and the Assigned Cumulative Probability on
the y-axis.
7.
If the resulting graph is approximately a
straight line, the dataset can be assumed to be Multivariate normal
distributed; otherwise, the dataset is not normally distributed.
Reference
BIVARIATE
QQ-PLOTS AND SPIDER WEB PLOTS on JSTOR
4.4
- Multivariate Normality and Outliers | STAT 505 (psu.edu)
GitHub Excel Reference
StuEx/Univariate Vs Multivariate _ Normality Check.xlsx at main · SPICYL/StuEx · GitHub
Comments
Post a Comment