Posts

Showing posts with the label PCA

Pre-treatment of data (Prior to PCA).

Image
  PCA is a maximum variance projection method, it follows that  a variable with a large variance is more likely to be expressed in the modeling than low-variance variable. In order to give variables, equal weight in the data analysis, we standardize them. Standardization is also known as "Scaling" or "Weighing", and means that the length of each co-ordinate axis in the variable space is regulated according to a pre-determined criterion. The first time a dataset is analyzed, it is recommended to set the length of each variable axis to equal length. The most common criterion is that the length of each variable axis be set to be the same variance (Unit Variance). In Unit Variance (UV) scaling, for each variable (k-column) one calculates standard deviation (Sk) and obtain the scaling weight as the inverse standard deviation (1/Sk). Subsequently, each column of X is multiplied by 1/Sk. Each scaled variable then has equal (unit variance). UV scaling is also called 'Au...

PCA on Bivariate Data, Comparison of using Covariance and Correlation

Image
  On PCA as Transformation Transformation of original correlated variable, xs, to transformed uncorrelated variables, pcs. Required rotation to explain maximum variance is based on eigen vector. Selecting Type of matrix to calculate the principal components/ transformed variables. Covariance : Use when your variables use the same scale, or when your variables have different scales but you want to give more emphasis to variables with higher variances. Correlation : Use when your variables have different scales and you want to weight all the variables equally.  Using Covariance,  Data Set : Method 1 vs Method 2 in chapter 1 of "A user's Guide to PCA by Edward Jackson". Generate Covariance matrix using original variables/observations, S. Get the eigen value diagonal matrix, L. Get the eigen vector matrix, U. Get the mean centered variable, CentX. Calculate PC's, Y, as a linear combination of the centered variables, CentX, using the entries of the eigen vector, U, as coef...