Univariate vs Multivariate – Normality Check

                                                 Univariate vs Multivariate – Normality Check.

Using the Q-Q Plot (Graphical Method)

For Univariate Data

Steps includes,

1.       Arrange the values in ascending order (Data > Sort)

2.       Compute the ranks of the values in the dataset (using RANK.EQ function)

Manually change the ranks of the set of values having the same rank.

3.       Calculate the cumulative probability of the ranks of the dataset, formula below.

Assigned or Fitted Cumulative Probability = ((i – 0.5)/n)

                Where n is the size (count) of the dataset, and i is the rank of data value.

4.       Calculate the normal predicted cumulative probability for the dataset based on empirical or theoretical normal distribution density function (using NORM.DIST function).

5.       Finally, plot a scatter plot with the Predicted Normal Cumulative Probability on the x-axis and the Assigned Cumulative Probability on the y-axis.

6.       If the resulting graph is approximately a straight line, the dataset can be assumed to be normally distributed; otherwise, the dataset is not normally distributed.

7.       One can also plot quantile vs sorted data set and the resulting graph is expected to be a straight line, if the dataset can be assumed to be normally distributed.

Reference

Test for Normality (Normal Dist.) – Excel and Google Sheets - Automate Excel

Normality Test Using Microsoft Excel : Intact Prolink Blog (inprolink.com) based on chi-square test

Empirical Distribution Function / Empirical CDF - Statistics How To

Excel NORM.DIST function | Exceljet

 

Multivariate Data

Steps includes,

1.       Calculate Squared Mahalanobis distance (T-square) for each observation in dataset.

2.       Sort the Squared Mahalanobis distance (T-square) in ascending order.

3.       Compute the ranks of the values in the dataset (using RANK.EQ function)

4.       Calculate the cumulative probability of the ranks of the dataset (Assigned based on rank).

5.       Calculate the chi-square predicted cumulative probability (as T-square is expected to follow Chi-square distribution).

6.       Finally, plot a scatter plot with the Predicted Cumulative Probability on the x-axis and the Assigned Cumulative Probability on the y-axis.

7.       If the resulting graph is approximately a straight line, the dataset can be assumed to be Multivariate normal distributed; otherwise, the dataset is not normally distributed.

Reference

BIVARIATE QQ-PLOTS AND SPIDER WEB PLOTS on JSTOR

4.4 - Multivariate Normality and Outliers | STAT 505 (psu.edu)

 

GitHub Excel Reference

StuEx/Univariate Vs Multivariate _ Normality Check.xlsx at main · SPICYL/StuEx · GitHub


Comments

Popular posts from this blog

Clear Understanding on Sin, Cos and Tan (Trigonometric Functions)

Clear Understanding on Mahalanobis Distance

Vignettes for Matrix concepts, related operations