Pre-treatment of data (Prior to PCA).
PCA is a maximum variance projection method, it follows that a variable with a large variance is more likely to be expressed in the modeling than low-variance variable.
In order to give variables, equal weight in the data analysis, we standardize them.
Standardization is also known as "Scaling" or "Weighing", and means that the length of each co-ordinate axis in the variable space is regulated according to a pre-determined criterion. The first time a dataset is analyzed, it is recommended to set the length of each variable axis to equal length. The most common criterion is that the length of each variable axis be set to be the same variance (Unit Variance).
In Unit Variance (UV) scaling, for each variable (k-column) one calculates standard deviation (Sk) and obtain the scaling weight as the inverse standard deviation (1/Sk). Subsequently, each column of X is multiplied by 1/Sk. Each scaled variable then has equal (unit variance). UV scaling is also called 'Auto-scaling'.
Plot below indicates effect of UV scaling on variables.
Prior to any pre-processing the variable have different variances and mean values. After scaling to UV, the length of each variable is identical. The mean values still remain different, however.Like many projection method, PCA is sensitive to scaling. However, one must not overlook the risk of scaling subjectively to give you the model you want. Generally, UV-scaling is the most objective approach, and is recommended if there is no prior information about the data.
Comments
Post a Comment