Difference between revisions of "SMHS PCA ICA FA"

From SOCR
Jump to: navigation, search
(PCA, ICA, FA: Similarities and Differences)
(Applications)
Line 207: Line 207:
  
 
===Applications===
 
===Applications===
* [[SOCR_EduMaterials_AnalysisActivities_PCA| This SOCR Activity]] demonstrated the utilization of SOCR analyses package for statistical computing in the SOCR environment. It presents a general introduction to PCA and the theoretical background of this statistical tool and shows how to use PCA and how to read and interpret the outcome. It introduced students to input data in the correct format, read the results of PCA and make interpretation of the resulting transformed data.
+
* [[SOCR_EduMaterials_AnalysisActivities_PCA| This SOCR Activity]] demonstrates the utilization of a SOCR analysis package for statistical computing in the SOCR environment. It presents a general introduction to PCA and the theoretical background of this statistical tool and demonstrates how to use PCA and read and interpret the outcome. It introduces students to inputting data in the correct format, reading the results of PCA and interpreting the resulting transformed data.
  
* [http://www.sciencedirect.com/science/article/pii/0169743987800849 This article] presents a general introduction to PCA. Principal component analysis of a data matrix extracts the dominant patterns in the matrix in terms of a complementary set of score and loading plots. It is the responsibility of the data analyst to formulate the scientific issue at hand in terms of PC projections, PLS regressions, etc. Ask yourself, or the investigator, why the data matrix was collected, and for what purpose the experiments and measurements were made. Specify before the analysis what kinds of patterns you would expect and what you would find exciting. The results of the analysis depend on the scaling of the matrix, which therefore must be specified. Variance scaling, where each variable is scaled to unit variance, can be recommended for general use, provided that almost constant variables are left unscaled. Combining different types of variables warrants blockscaling. In the initial analysis, look for outliers and strong groupings in the plots, indicating that the data matrix perhaps should be “polished” or whether disjoint modeling is the proper course. For plotting purposes, two or three principal components are usually sufficient, but for modeling purposes the number of significant components should be properly determined, e.g. by cross-validation. Use the resulting principal components to guide your continued investigation or chemical experimentation, not as an end in itself.  
+
* [http://www.sciencedirect.com/science/article/pii/0169743987800849 This article] presents a general introduction to PCA. Principal component analysis of a data matrix extracts the dominant patterns in the matrix in terms of a complementary set of score and loading plots. It is the responsibility of the data analyst to formulate the scientific issue at hand in terms of PC projections, PLS regressions, etc. Ask yourself, or the investigator, why the data matrix was collected, and for what purpose the experiments and measurements were made. Specify before the analysis what kinds of patterns you would expect and what you would find exciting. The results of the analysis depend on the scaling of the matrix, which therefore must be specified. Variance scaling, where each variable is scaled to unit variance, is recommended for general use, provided that almost constant variables are left unscaled. Combining different types of variables warrants block-scaling. In the initial analysis, look for outliers and strong groupings in the plots, indicating that the data matrix perhaps should be “polished” or whether disjoint modeling is the proper course. For plotting purposes, two or three principal components are usually sufficient, but for modeling purposes the number of significant components should be properly determined, e.g. by cross-validation. Use the resulting principal components to guide your continued investigation or chemical experimentation, not as an end in itself.  
  
* [http://link.springer.com/article/10.1007/BF02294359 This article] introduced the Akaike Information Criterion (AIC) to extend the method of maximum likelihood to the multi-model situation. It related the successful experience of the order determination of an autoregressive model to the determination of the number of factors in the maximum likelihood factor analysis. The use of the AIC criterion in the factor analysis is particularly interesting when it is viewed as the choice of a Bayesian model. This observation shows that the area of application of AIC can be much wider than the conventional i.i.d. type models on which the original derivation of the criterion was based. The observation of the Bayesian structure of the factor analysis model leads us to the handling of the problem of improper solution by introducing a natural prior distribution of factor loadings.
+
* [http://link.springer.com/article/10.1007/BF02294359 This article] introduces the Akaike Information Criterion (AIC) to extend the method of maximum likelihood to the multi-model situation. It relates the order determination of an autoregressive model to the determination of the number of factors in a maximum likelihood factor analysis. The use of the AIC criterion in factor analysis is particularly interesting when it used with a Bayesian model. This observation reveals that AIC can be more widely applied than only to the conventional i.i.d. type models on which the original derivation of the criterion was based. The observation of the Bayesian structure of the factor analysis model leads to the handling of the problem of improper solutions by introducing a natural prior distribution of factor loadings.
  
* [http://onlinelibrary.wiley.com/doi/10.1002/9780470057339.vnn086/abstract This article] contains a good introduction to the application of ICA. Independent component models have gained increasing interest in various fields of applications in recent years. The basic independent component model is a semi-parametric model assuming that a p-variate observed random vector is a linear transformation of an unobserved vector of p independent latent variables. This linear transformation is given by an unknown mixing matrix, and one of the main objectives of independent component analysis (ICA) is to estimate an unmixing matrix by means of which the latent variables can be recovered. In this article, we discuss the basic independent component model in detail, define the concepts and analysis tools carefully, and consider two families of ICA estimates. The statistical properties (consistency, asymptotic normality, efficiency, robustness) of the estimates can be analyzed and compared via the so called gain matrices. Some extensions of the basic independent component model, such as models with additive noise or models with dependent observations, are briefly discussed. The article ends with a short example.
+
* [http://onlinelibrary.wiley.com/doi/10.1002/9780470057339.vnn086/abstract This article] contains a good introduction to the application of ICA. Independent component models have gained increasing interest in various fields in recent years. The basic independent component model is a semi-parametric model that assumes that a p-variate observed random vector is a linear transformation of an unobserved vector of p independent latent variables. This linear transformation is given by an unknown mixing matrix, and one of the main objectives of independent component analysis (ICA) is to estimate the unmixing matrix so the latent variables can be recovered. In this article, we discuss the basic independent component model in detail, define the concepts and analysis tools carefully, and consider two families of ICA estimates. The statistical properties (i.e., consistency, asymptotic normality, efficiency, robustness) of the estimates can be analyzed and compared via the so-called gain matrices. Some extensions of the basic independent component model, such as models with additive noise or models with dependent observations, are briefly discussed. The article ends with a short example.
  
 
===Software ===
 
===Software ===

Revision as of 13:31, 1 September 2014

Scientific Methods for Health Sciences - Dimensionality Reduction: PCA, ICA, FA

Overview

  • PCA (principal component analysis) is a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables through a process known as orthogonal transformation.
  • ICA (independent component analysis) is a computational tool to separate a multivariate signal into additive subcomponents by assuming that the subcomponents are non-Gaussian signals and are statistically independent from each other.
  • Factor analysis is a statistical method that describes variability among observed correlated variables in terms of a potentially lower number of unobserved variables. It is related to PCA but they are not identical. In this section, we are going to introduce these three commonly used statistical tools and illustrate their application in various studies with examples and R code samples.

Motivation

Suppose we have a set of observable correlated random variables, and we want to reduce the dimensionality of the data into a reasonable new set. How can we achieve this dimensionality reduction? Principal component analysis, independent component analysis and factor analysis may be the answers here. How does each of them work? What are the differences among those statistical methods; what are their strengths and weaknesses? How can we decide on the best method for a specific dataset?

Theory

PCA (principal component analysis)

PCA is a statistical technique that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of variables that are linearly uncorrelated. The resulting uncorrelated variables are called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for the remaining variability. PCA is the simplest of the true eigenvector-based multivariate analyses; it reveals the internal structure of the data in a way that best explains the variance in the data. It is a useful statistical technique that has been applied in many fields, including computer networks and image processing, and is a powerful method for finding patterns in high-dimensional datasets.

Consider a data matrix $X_{n\times p}$ with column-wise empirical means of zero (i.e., the sample mean of each column has been shifted to zero), where each of the $n$ rows represents a different repetition of the experiment, and each of the $p$ columns gives a particular kind of data-element (i.e., variable). Mathematically, the transformation is defined by a set of $p$-dimensional vectors of weights $w_{(k)}=(w_1,w_2,…,w_p)_{(k)}$, constrained to be unitary ($\\w_{(k)}||=1$), which map each row vector $x_i$ of $X$ to a new vector of principal component scores $t_{(i)}=(t_1,t_2,…,t_p)_{(i)}$, given by $t_{k(i)}=x_{(i)} w_{(k)}$. The mapping occurs such that the individual elements of $t$ considered over the data set successively inherit the maximum possible variance from $x$.

  • First component: the first loading vector $w_{(1)}= \arg\max_{||w||=1} {\sum_i{(t_1)_{(i)}^2 }}= \arg\max_{||w||=1} {\sum_i{(x_{(i)} w)^2 }}$, in matrix form: $w_{(1)}=\arg\max_{||w||=1} {||Xw||^2 }=\arg\max_{||w||=1} {w^T X^T Xw}$.
  • Further components: the $k^{th}$ component can be found by subtracting the first $k-1$ principal components from $X$. $\hat{X}_{k-1} = X-\sum_{s=1}^{k-1}{X w_{(s)} w_{(s)}^T }$ and finding the loading vector involves extracting the maximum variance from this new data matrix: $w_{(k)}=\arg\max_{||w||=1} {||\bar{X}_{k-1} w||^2 } = \arg\max {\frac{w^T \hat{X}_{k-1}^T \hat{X}_{k-1} w}{w^T w}}.$ This gives the remaining eigenvectors of $X^T X$ with the maximum values for the quantity in brackets given by other corresponding eigenvalues. The full principal components decomposition of $X$ can therefore be given as $T=XW$ where $W$ is a $p\times p$ matrix whose columns are the eigenvectors of $X^T X$.
  • Dimensionality reduction: The faithful transformation $T = X W$ maps a data vector $x_{(i)}$ from an original space of $p$ variables to a new space of $p$ (other) variables which are uncorrelated over the dataset. However, not all the principal components need to be kept. Keeping only the first $L$ principal components, produced by using only the first $L$ loading vectors, gives the truncated transformation $T_L=XW_L$ where $T_L$ has $n$ rows but only $L$ columns. This dimensionality reduction is very useful for visualizing and processing high-dimensional datasets while still retaining as much of the variance in the dataset as possible.
  • Singular value decomposition (SVD)
The SVD of $X=U \Sigma W^T$ where $\Sigma$ is an $n\times p$ rectangular diagonal matrix of positive numbers $\sigma_{(k)}$, the singular values of $X$; $U$ is an $n\times n$ matrix, the columns of which are orthogonal until vector of length $n$ called the left singular vectors of $X$; and $W$ is a $p\times p$ whose columns are orthogonal until vector of length $p$ and called the right singular vectors of $X$. With factorization, the matrix $X^T X=W \Sigma U^T U \Sigma W^T = W \Sigma^2 W^T$; with singular value decomposition the score matrix $T=XW=U\Sigma W^T W=U\Sigma$, where each column of $T$ is given by one of the left singular vectors of $X$ multiplied by the corresponding singular value.
  • Computational details of PCA:
    • Begin with a dataset that contains at least two dimensions (i.e., variables). The dataset can contain as many observations (dimensions) as you like;
    • Normalize the observations for each variable. To do this, simply subtract the mean (average) from each observation within a given variable. For example, let $X$ and $Y$ be the two variables from the original dataset, with variable $X$ containing observations $X_1,X_2,…,X_n$, and variable $Y$ containing observations $Y_1,Y_2,…,Y_n$. Let $\bar{X}$ be the average of the $n$ observations of $X$, i.e. $\bar{X}= \frac{X_1+X_2+⋯+X_n}{n}$, and similarly let $\bar{Y}= \frac{Y_1+Y_2+⋯+Y_n}{n}$ be the average of the $Y$ observations. Then, the normalized dataset would be, for variable $X$: $\{X_1-\bar{X}, X_2-\bar{X}, …,X_n-\bar{X}\}$ and for variable $Y$: $\{Y_1-\bar{Y},Y_2-\bar{Y}, …,Y_n-\bar{Y}\}$
    • Calculate the covariance matrix between the variables of the normalized dataset;
    • Calculate the eigenvalue and eigenvectors of the covariance matrix (the eigenvectors must be normalized to a length of 1);
    • (5) Choose the most significant principal component, which is simply the eigenvector with the highest eigenvalue.

PCA Properties

  • For any integer $q$, $1 \leq q \leq p$, consider the orthogonal linear transformation $y=B'x$, where $y$ is a $q$-element vector and $B'$ is a $q\times q$ matrix, and let $\Sigma_y =B'\Sigma B$ be the variance-covariance matrix for $y$. Then the trace of $\Sigma_y$, $tr(\Sigma_y)$, is maximized by taking $B=A_q$, where $A_q$ consists of the first $q$ columns of $A$;
  • $y=B'x$ and the $tr(\Sigma_y)$ are minimized by taking $B=A_q^*$ where $A_q^*$ consists of the last $q$ columns of $A$. The last few principal components are not simply unstructured left-overs after removing the important ones;
  • $\Sigma = \lambda_1 \alpha_1 \alpha_1'+\lambda_2 \alpha_2 \alpha_2'+⋯+\lambda_p \alpha_p \alpha_p'$. Given that $var(x_j)=\sum_{k=1}^p {\lambda_k \alpha_{kj}^2 }$, the elements of $\lambda_k \alpha_k \alpha_k'$ tends to become smaller as SkS increases, whereas the elements of $\lambda_k$ tends to stay about the same size because $\alpha_k \alpha_k'=1,$ for $k=1,2,…,p$.

PCA Limitations

  • The results of PCA depend on the scaling of the variables.
  • The applicability of PCA is limited by certain assumptions made in its derivation.

PCA in R

require(graphics)
## The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
(pc.cr <- princomp(USArrests))  # inappropriate

Call:
princomp(x = USArrests)

Standard deviations:
   Comp.1    Comp.2    Comp.3    Comp.4 
82.890847 14.069560  6.424204  2.45783

princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)

Call:
princomp(x = USArrests, cor = TRUE)

Standard deviations:
   Comp.1    Comp.2    Comp.3    Comp.4 
1.5748783 0.9948694 0.5971291 0.4164494

## Similar, but different:
## The standard deviations differ by a factor of sqrt(49/50)
summary(pc.cr <- princomp(USArrests, cor = TRUE))

Importance of components:
                          Comp.1    Comp.2    Comp.3     Comp.4
Standard deviation     1.5748783 0.9948694 0.5971291 0.41644938
Proportion of Variance 0.6200604 0.2474413 0.0891408 0.04335752
Cumulative Proportion  0.6200604 0.8675017 0.9566425 1.00000000

loadings(pc.cr)  # note that blank entries are small but not zero
## The signs of the columns are arbitrary


Loadings:
         Comp.1 Comp.2 Comp.3 Comp.4
Murder   -0.536  0.418 -0.341  0.649
Assault  -0.583  0.188 -0.268 -0.743
UrbanPop -0.278 -0.873 -0.378  0.134
Rape     -0.543 -0.167  0.818   

               Comp.1 Comp.2 Comp.3 Comp.4
SS loadings      1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

plot(pc.cr) # shows a screeplot.
 
# The histogram distribution presents a vivid picture of the variance attributable to the first four significant principal  
# components respectively.
biplot(pc.cr)  ## shows the plot of PCA in a different format

From the chart above, we can see the distribution of the variance attributed to different variables in the four principal components.

PCA using SOCR Analyses

This SOCR activity illustrates the use of PCA.

ICA (independent component analysis)

ICA is a computational method that separates a multivariate signal into additive subcomponents by assuming that the subcomponents are non-Gaussian signals and are statistically independent from each other.

  • ICA Assumptions:
    • The source signals are independent of each other.
    • The distributions of the values in each source signal are non-Gaussian.
    • Independence: The sources of signals are assumed to be independent, but their signal mixtures are not independent because they share the same source signals;
    • Normality: Based on CLT, the distribution of the sum of independent random variables approximates a Gaussian distribution;
    • Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal.
ICA maximizes the statistical independence of the estimated components to find the independent components. In general, ICA cannot identify the actual number of source signals nor can it identify the proper scaling of the source signals. Suppose the data is represented by the random vector $x=(x_1,x_2,…,x_m )^t$, and the components are denoted as $s=(s_1,s_2,…,s_n )^t$. We need to transform the observed data $x$ using a linear transformation $w$, $s=Wx$, into maximally independent components $s$ measured by some functions of independence. There are alternative models for ICA:
  • Linear noiseless ICA: the components $x_i$ of the data $x=(x_1,x_2,…,x_m )^t$ are generated as a sum of the independent components $s_k$, for $k=1,…,n$; $x_i=a_{i,1} s_1 + ⋯ + a_{i,k} s_k + ⋯ + a_{i,n} s_n$, weighted by the mixing weights $a_{i,k}$.
  • Linear noisy ICA: with additional assumption of zero-mean and uncorrelated Gaussian noise $n \sim N(0,diag(\Sigma))$, the ICA model takes the form $x=As+n$.
  • Nonlinear ICA: the mixing of the sources is not necessarily linear. Using a nonlinear mixing function $f(.|θ)$ with parameter $\theta$ the nonlinear ICA model is $x=f(s│θ)+n$.
  • Identifiability: the independent components are identifiable up to a permutation and scaling of the sources, which requires: (1) at most one of the sources $s_k$ is Gaussian; (2) the number of observed mixtures, $m$, must be at least as large as the number of estimated components: $n$ such that $n\leq m$, i.e., the mixing matrix $A$ must be of full rank in order to have inverse.
  • ICA in R using package fastICA. This example demonstrates how to separate two mixed independent uniform variables:
library(fastICA)
S <- matrix(runif(10000), 5000, 2)
A <- matrix(c(1, 1, -1, 3), 2, 2, byrow = TRUE)
X <- S %*% A # In R,  "*" and "%*%" indicate "scalar" and matrix multiplication, respectively!

a <- fastICA(X, 2, alg.typ = "parallel", fun = "logcosh", alpha = 1,
                 method = "C", row.norm = FALSE, maxit = 200,
                 tol = 0.0001, verbose = TRUE)
par(mfrow = c(1, 2))
plot(a$\$ $X, main = "Pre-processed data")
plot(a$\$ $S, main = "ICA components")

Another example of un-mixing two independent signals is shown below:
S <- cbind(sin((1:1000)/20), rep((((1:200)-100)/100), 5))
# cbind combines objects by rows and columns. It takes a sequence of vector, matrix or data frames arguments and combines them by columns or rows, respectively. 

A <- matrix(c(0.291, 0.6557, -0.5439, 0.5572), 2, 2)
X <- S %*% A
a <- fastICA(X, 2, alg.typ = "parallel", fun = "logcosh", alpha = 1,
                 method = "R", row.norm = FALSE, maxit = 200,
                 tol = 0.0001, verbose = TRUE)
par(mfcol = c(2, 3))
plot(1:1000, S[,1 ], type = "l", main = "Original Signals",
         xlab = "", ylab = "")
plot(1:1000, S[,2 ], type = "l", xlab = "", ylab = "")
plot(1:1000, X[,1 ], type = "l", main = "Mixed Signals",
         xlab = "", ylab = "")
plot(1:1000, X[,2 ], type = "l", xlab = "", ylab = "")
plot(1:1000, a$\$ $S[,1 ], type = "l", main = "ICA source estimates",
         xlab = "", ylab = "")
plot(1:1000, a$\$ $S[, 2], type = "l", xlab = "", ylab = "")

Factor analysis (FA)

FA is a statistical method that describes variability among observed correlated variables in terms of a potentially lower number of unobserved variables. Consider a set of $p$ observable random variables, $x_1,x_2,…,x_p$ with means $μ_1,μ_2,…,μ_p$, respectively. Suppose for some unknown constants $l_{i,j}$ and $k$ unobserved random variables $F_j$, where $i\in \{1,…,p\}$ and $j \in \{1,…,k\}$ where $k<p$. We have $x_i-μ_i=l_{i,1} F_1 + ⋯ +l_{i,k} F_k + ε_i$, where $ε_i$ are independently distributed error terms with mean zero and finite variance. In matrix form, we have $x-μ = LF+ε$, with $n$ observations; $x$ is a $p\times n$ matrix, $L$ is a $p \times k$ matrix and $F$ is $k\times n$ matrix. Assumptions: (1) $ε$ and $F$ are independent; (2) $E(F)=0$; (3) $cov(F)=I$ to make sure the factors are uncorrelated. Solutions to the equations above yield the factors $F$ and the loading matrix, $L$.

  • Example: In the following, matrices will be indicated by indexed variables. "Subject" indices will be indicated using the letters $a$, $b$ and $c$, with values running from $1$ to $N_a$, which is equal to $10$ in the above example. "Factor" indices will be indicated using the letters $p$, $q$ and $r$, with values running from $1$ to $N_p$ which is equal to $2$ in the above example. "Instance" or "sample" indices will be indicated using letters $i$, $j$ and $k$, with values running from $1$ to $N_i$. In the example above, if a sample of $N_i=1000$ students responded to the $N_a=10$ questions, the $i^{th}$ student's score for the $a^{th}$ question are given by $x_{ai}$. The purpose of factor analysis is to characterize the correlations between the variables $x_a$ of which the $x_{ai}$ is a particular instance, or set of observations. To ensure that all variables are on equal footing, they are standardized: $z_{ai}=\frac{x_{ai}- μ_a}{σ_a}$, where $μ_a$ is the sample mean and sample variance $σ_a^2=\frac{1}{N_a} \sum_i {(x_{ai}-μ_a)^2}$. The factor analysis model is expressed by:
$$\begin{matrix}z_{1,i} & = & \ell_{1,1}F_{1,i} & + & \ell_{1,2}F_{2,i} & + & \varepsilon_{1,i} \\ \vdots & & \vdots & & \vdots & & \vdots \\ z_{10,i} & = & \ell_{10,1}F_{1,i} & + & \ell_{10,2}F_{2,i} & + & \varepsilon_{10,i} \end{matrix}$$

In matrix form ($Z=LF+ϵ$), this model can be expressed as: $$z_{ai}=\sum_p \ell_{ap}F_{pi}+\varepsilon_{ai},$$ where $F_{1,i}$ is the $i^{th}$ student’s verbal intelligence, $F_{2,i}$ is the $i^{th}$ student’s mathematical intelligence; $l_{ap}$ are the factor loadings for the $a^{th}$ subject for $p=1,2$.

FA in R: using factanal()
# Maximum Likelihood Factor Analysis
# entering raw data and extracting 3 factors, 
# with varimax rotation 

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
# mmreg.csv includes 600 observations and 8 variables. 
# The psychological variables are locus_of_control, self_concept and motivation. 
# The academic variables are standardized tests in reading (read), writing (write), math (math) and science (science). 
# Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

# We can get some basic descriptions of the entire data set by using summary. 
# To get the standard deviations, we use sapply to apply the sd function to each variable in the dataset.
summary(mydata)

sapply(mydata, sd)

fit <- factanal(mydata, 3, rotation="varimax")  # mydata can be a raw data matrix or a covariance matrix. 
# Pairwise deletion  of missing data is used. Rotation can be "varimax" or "promax".

print(fit, digits=2, cutoff=.3, sort=TRUE)
# plot factor 1 by factor 2 
load <- fit$\$ $loadings[,1:2] 
plot(load,type="n") # set up plot 
text(load,labels=names(mydata),cex=.7) # add variable names

# Principal Axis Factor Analysis
library(psych)
fit <- factor.pa(mydata, nfactors=3, rotation="varimax")
fit # print results

# Determine Number of Factors to Extract
library(nFactors)
ev <- eigen(cor(mydata)) # get eigenvalues
ap <- parallel(subject=nrow(mydata),var=ncol(mydata), rep=100,cent=.05)
nS <- nScree(x=ev$\$ $values, aparallel=ap$\$ $eigen$\$ $qevpea)
plotnScree(nS)

PCA, ICA, FA: Similarities and Differences

  • PCA is closely related to factor analysis. The latter typically incorporates more domain specific assumptions about the underlying structure and solves eigenvectors of a slightly different matrix. Principal components create variables that are linear combinations of the original variables. The new variables have the property that the variables are all orthogonal. The principal components can be used to find clusters in a set of data.
  • PCA is a variance-focused approach that seeks to reproduce the variable's total variance. In PCA, the components reflect both common and unique variance of the variable. It is generally preferred for the purposes of data reduction (i.e., translating variable space into optimal factor space) but not for detecting latent constructs or factors. Factor analysis is similar to principal component analysis in that factor analysis also involves linear combinations of variables.
  • In contrast to PCA, factor analysis is a correlation-focused approach seeking to reproduce the correlations among variables. The factors represent the common variance of variables and exclude unique variance. Factor analysis is generally used when the purpose of the research is detection of underlying structure in the data structure (i.e., latent constructs or factors) or causal modeling.

Applications

  • This SOCR Activity demonstrates the utilization of a SOCR analysis package for statistical computing in the SOCR environment. It presents a general introduction to PCA and the theoretical background of this statistical tool and demonstrates how to use PCA and read and interpret the outcome. It introduces students to inputting data in the correct format, reading the results of PCA and interpreting the resulting transformed data.
  • This article presents a general introduction to PCA. Principal component analysis of a data matrix extracts the dominant patterns in the matrix in terms of a complementary set of score and loading plots. It is the responsibility of the data analyst to formulate the scientific issue at hand in terms of PC projections, PLS regressions, etc. Ask yourself, or the investigator, why the data matrix was collected, and for what purpose the experiments and measurements were made. Specify before the analysis what kinds of patterns you would expect and what you would find exciting. The results of the analysis depend on the scaling of the matrix, which therefore must be specified. Variance scaling, where each variable is scaled to unit variance, is recommended for general use, provided that almost constant variables are left unscaled. Combining different types of variables warrants block-scaling. In the initial analysis, look for outliers and strong groupings in the plots, indicating that the data matrix perhaps should be “polished” or whether disjoint modeling is the proper course. For plotting purposes, two or three principal components are usually sufficient, but for modeling purposes the number of significant components should be properly determined, e.g. by cross-validation. Use the resulting principal components to guide your continued investigation or chemical experimentation, not as an end in itself.
  • This article introduces the Akaike Information Criterion (AIC) to extend the method of maximum likelihood to the multi-model situation. It relates the order determination of an autoregressive model to the determination of the number of factors in a maximum likelihood factor analysis. The use of the AIC criterion in factor analysis is particularly interesting when it used with a Bayesian model. This observation reveals that AIC can be more widely applied than only to the conventional i.i.d. type models on which the original derivation of the criterion was based. The observation of the Bayesian structure of the factor analysis model leads to the handling of the problem of improper solutions by introducing a natural prior distribution of factor loadings.
  • This article contains a good introduction to the application of ICA. Independent component models have gained increasing interest in various fields in recent years. The basic independent component model is a semi-parametric model that assumes that a p-variate observed random vector is a linear transformation of an unobserved vector of p independent latent variables. This linear transformation is given by an unknown mixing matrix, and one of the main objectives of independent component analysis (ICA) is to estimate the unmixing matrix so the latent variables can be recovered. In this article, we discuss the basic independent component model in detail, define the concepts and analysis tools carefully, and consider two families of ICA estimates. The statistical properties (i.e., consistency, asymptotic normality, efficiency, robustness) of the estimates can be analyzed and compared via the so-called gain matrices. Some extensions of the basic independent component model, such as models with additive noise or models with dependent observations, are briefly discussed. The article ends with a short example.

Software

Problems

R Example 1

# Install package ‘fastICA’

> library(fastICA)
# Using the SOCR 1981-2006 CPI Data (http://wiki.socr.umich.edu/index.php/SOCR_Data_Dinov_021808_ConsumerPriceIndex), 
# save the table in ASCII text file CPI_Data.dat. Note the "dir" (folder) where you saved the data and reference it below
> CPI_Data <- as.matrix(read.table("/dir/CPI_Data.dat",header=T))

> # compare PCA and FA analyses
> X <- CPI_Data[,-1]
> pc.cor <- princomp(X,cor=T)
> summary(pc.cor )
Importance of components:
                          Comp.1    Comp.2     Comp.3     Comp.4     Comp.5     Comp.6      Comp.7
Standard deviation     2.1348817 1.1696678 0.71243186 0.54364890 0.38449985 0.31956304 0.145200770
Proportion of Variance 0.6511029 0.1954461 0.07250845 0.04222202 0.02112002 0.01458865 0.003011895
Cumulative Proportion  0.6511029 0.8465490 0.91905742 0.96127944 0.98239946 0.99698811 1.000000000

> ica <- fastICA(X,n.comp=7)
> names(ica)
[1] "X" "K" "W" "A" "S"
#  X: pre-processed data matrix (whitened/sphered data)
#  K: pre-whitening matrix that projects data onto the first n.comp
#     principal components.
#  W: estimated un-mixing matrix (XW = S)
#  A: estimated mixing matrix (X = SA)
#  S: estimated source matrix (factor scores, $\Theta$ in the notes)

> windows()
> biplot(pc.cor)

> S <- ica$\$ $S
> dimnames(S) <- list(dimnames(X)1,paste("Cmp.",1:7,sep=""))
> A <- ica$\$ $A
> dimnames(A) <- list(dimnames(X)2,paste("Cmp.",1:7,sep=""))
> windows()
> biplot(S[,1:2],A[,1:2])

> loadings(pc.cor) 

Loadings:
             Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Electricity  -0.415 -0.227  0.164  0.512 -0.373 -0.576
Fuel_Oil     -0.351  0.547               -0.198  0.312
Bananas      -0.373 -0.415  0.258  0.365  0.393  0.578
Tomatoes     -0.369 -0.320  0.357 -0.738 -0.294       
Orange_Juice -0.324 -0.311 -0.871        -0.119       
Beef         -0.424  0.220        -0.216  0.721 -0.449
Gasoline     -0.380  0.479               -0.216  0.161
             Comp.7
Electricity  -0.131
Fuel_Oil     -0.657
Bananas            
Tomatoes           
Orange_Juice  0.108
Beef               
Gasoline      0.733

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.143  0.143  0.143  0.143  0.143  0.143
Cumulative Var  0.143  0.286  0.429  0.571  0.714  0.857
               Comp.7
SS loadings     1.000
Proportion Var  0.143
Cumulative Var  1.000

> field <- function(x) { substr(paste(x,"           ",sep=""),1,6) }
> A.str <- ifelse(abs(A)<2,field(" "),field(round(A,2)))
> dimnames(A.str) <- list(dimnames(X)2,paste("Cmp.",1:7,sep=""))
> write.table(A.str[,1:4],"",quote=F,row.names=T,col.names=NA)
 		Cmp.1 Cmp.2 Cmp.3 Cmp.4
Electricity                            
Fuel_Oil                            
Bananas 	-2.59                      
Tomatoes 	-2.9                       
Orange_Juice 	-2.66                      
Beef                            
Gasoline                            

> L <- pc.cor$loadings
> L.str <- ifelse(abs(L)<0.3,field(" "),field(round(L,2)))
> dimnames(L.str) <- list(dimnames(X)2,paste("Cmp.",1:7,sep=""))
> write.table(L.str[,1:4],"",quote=F,row.names=T,col.names=T)
		Cmp.1 Cmp.2 Cmp.3 Cmp.4
Electricity 	-0.41                0.51  
Fuel_Oil 	-0.35  0.55                
Bananas 	-0.37  -0.41         0.37  
Tomatoes 	-0.37  -0.32  0.36   -0.74 
Orange_Juice 	-0.32  -0.31  -0.87        
Beef 		-0.42                      
Gasoline 	-0.38  0.48

References


Appendix

SOCR 1981-2006 CPI Dataset

Year Electricity Fuel_Oil Bananas Tomatoes Orange_Juice Beef Gasoline
1981 31.552 1.15 0.343 0.792 1.141 1.856 1.269
1982 36.006 1.254 0.364 0.763 1.465 1.794 1.341
1983 37.184 1.194 0.332 0.726 1.418 1.756 1.214
1984 38.6 1.122 0.344 0.854 1.408 1.721 1.2
1985 38.975 1.078 0.35 0.697 1.685 1.711 1.145
1986 40.223 1.126 0.337 1.104 1.756 1.662 1.19
1987 40.022 0.817 0.374 0.871 1.512 1.694 0.868
1988 40.195 0.89 0.394 0.797 1.638 1.736 0.947
1989 40.828 0.883 0.429 1.735 1.868 1.806 0.944
1990 41.663 1.259 0.438 0.912 1.817 1.907 1.09
1991 43.226 1.235 0.428 0.936 2.005 1.996 1.304
1992 44.501 0.985 0.426 1.141 1.879 1.926 1.135
1993 46.959 0.969 0.44 1.604 1.677 1.97 1.182
1994 48.2 0.919 0.503 1.323 1.674 1.892 1.109
1995 48.874 0.913 0.463 1.103 1.583 1.847 1.19
1996 48.538 1.007 0.497 1.213 1.577 1.799 1.186
1997 49.245 1.136 0.473 1.452 1.737 1.85 1.318
1998 46.401 0.966 0.489 1.904 1.601 1.818 1.186
1999 45.061 0.834 0.49 1.443 1.753 1.834 1.031
2000 45.207 1.189 0.5 1.414 1.823 1.903 1.356
2001 47.472 1.509 0.509 1.451 1.863 2.037 1.525
2002 47.868 1.123 0.526 1.711 1.876 2.152 1.209
2003 47.663 1.396 0.512 1.472 1.848 2.131 1.557
2004 49.159 1.508 0.485 1.66 1.957 2.585 1.635
2005 50.847 1.859 0.49 2.162 1.872 2.478 1.866
2006 57.223 2.418 0.505 1.621 1.853 2.607 2.359





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif