PCA on Bivariate Data, Comparison of using Covariance and Correlation

 

On PCA as Transformation

  1. Transformation of original correlated variable, xs, to transformed uncorrelated variables, pcs.
  2. Required rotation to explain maximum variance is based on eigen vector.

Selecting Type of matrix to calculate the principal components/ transformed variables.

  • Covariance: Use when your variables use the same scale, or when your variables have different scales but you want to give more emphasis to variables with higher variances.
  • Correlation: Use when your variables have different scales and you want to weight all the variables equally. 

Using Covariance, 

  1. Data Set: Method 1 vs Method 2 in chapter 1 of "A user's Guide to PCA by Edward Jackson".
  2. Generate Covariance matrix using original variables/observations, S.
  3. Get the eigen value diagonal matrix, L.
  4. Get the eigen vector matrix, U.
  5. Get the mean centered variable, CentX.
  6. Calculate PC's, Y, as a linear combination of the centered variables, CentX, using the entries of the eigen vector, U, as coefficients. (Y=CentX*U)
  7. Summary: Scores based on eigen vector and centered data. 
  8. Key points:
    1. Cov matrix of transformed variable (pcs),Sy, is diagonal matrix with diagonal elements same as eigen vector, L, based on covariance matrix.
    2. Sum of diagonal elements of Cov matrix based on transformed variable (pcs), Sy, is same as sum of diagonal elements of Cov matrix of original variable (x's), S.
    3. Determinant of matrix S, Sy and L is same.
    4. PCA based on covariance is mean centered data scaled to eigen value of Cov matrix.
    5. Cov matrix of pcs = L = U'SU
  9. Refer sheet with name "PCA on Cov, score<CentX,eigV"
  10. Summary plots below produced using JMP 17, includes scores plots with confidence ellipse and loading plot for variable x1, x2. 

Using Correlation,

  1. Data Set: Method 1 vs Method 2 in chapter 1 of "A user's Guide to PCA by Edward Jackson".
  2. Generate Correlation matrix using original variables/observations, R.
  3. Get the eigen value diagonal matrix, L.
  4. Get the eigen vector matrix, U.
  5. Get the standardized variable, Z. (scaled to unit variance).
  6. Calculate PC's,Y, as a linear combination of the standardized variables, Z, using the entries of the eigen vector, U, as coefficients. (Y=Z*U)
  7. Summary: Scores based on eigen vector and centered data. 
  8. Key points:
    1. Cov matrix of transformed variable (pcs),Sy, is diagonal matrix with diagonal elements same as eigen vector, L, based on correlation matrix.
    2. Sum of diagonal elements of Cov matrix based on transformed variable (pcs), Sy, is not the same as sum of diagonal elements of Cor matrix of original variable (x's), R. And, which is equal to number of variables used.
    3. Determinant of matrix S and L is same.
    4. PCA based on correlation is standardized data scaled to eigen value of Cor matrix.
    5. Cov matrix of pcs = L = U'RU
  9. Refer sheet with name "PCA on Cor, Score<Z,eigV".
  10. Summary plots below produced using JMP 17, includes scores plots with confidence ellipse and loading plot for variable x1, x2.


In above example, variables are of same scale and similar variance thus score and loading plot are similar.

Reference

  1. Principal Components Report Options (jmp.com)
  2. Enter your data for Principal Components Analysis - Minitab
  3. https://github.com/SPICYL/mvda/blob/main/PCA%20on%20Bivariate%20Data%2C%20Comparison%20of%20using%20Covariance%20and%20Correlation.xlsx
  4. R: Scaling and Centering of Matrix-like Objects (ethz.ch)
  5. R: Unit-Variance scaling of each column (r-project.org)
  6. A geometric interpretation of the covariance matrix (visiondummy.com)

Comments

Popular posts from this blog

Clear Understanding on Sin, Cos and Tan (Trigonometric Functions)

Clear Understanding on Mahalanobis Distance

Vignettes for Matrix concepts, related operations