# Implementation in a nutshell If one calls $\mathbf{X}$ the original sample whose dimension is $(n_X,n_S)$, the idea behind PCA is to find the projection matrix $\mathbf{P}$, whose dimension is $(n_X,n_X)$, that would re-express the data optimally as a new sample, called hereafter $\mathbf{Y}$, with the same dimension $(n_X,n_S)$. The rows of $\mathbf{P}$ are forming a new basis to represent the column of $\mathbf{X}$ and this new basis will later become our principal component directions. Now recalling the aim of PCA, the way to determine this projection matrix is crucial and should be designed as to - find out the best linear combinations between variables so that the minimum number of rows (principal components) of $\mathbf{P}$ are considered useful to carry on as much inertia as possible; - rank the principal component so that, if not satisfy with the new representation, it would be simple to add an extra principal component to improve it. This can be done by investigating the covariance matrix $\mathbf{C_X}$ of $\mathbf{X}$ that, by definition, describes the linear combination between variables and that could be computed from the centered matrix sample $\mathbf{X_C}$[^metho_pca_nutshell] as [^metho_pca_nutshell]: The centered matrix is defined as $\mathbf{X_C} = \mathbf{X} - \bar{\mathbf{x}}^T\mathbf{1_{n_S}}$ where $\bar{\mathbf{x}}$ is the vector of mean value for every variable and $\mathbf{1_{n_S}}$ is a vector of 1 whose dimension is $n_S$. ```{math} \mathbf{C_X} = \frac{1}{n_S-1}\mathbf{X_C}\mathbf{X_C}^T ``` If one consider the resulting covariance matrix $\mathbf{C_Y}$, the aim is to maximise the signal measured by variance (diagonal entries that represents the variance of the principal components) while minimising the covariance between them. As the lowest covariance value reachable is 0, if the desired covariance matrix $\mathbf{C_Y}$ would append to be diagonal, this would mean our objectives are achieved. From the very definition of the covariance matrix, one could see that ```{math} \mathbf{C_Y} = \frac{1}{n_S-1} \mathbf{Y_C} \mathbf{Y_C}^T = \frac{1}{n_S-1} (\mathbf{P}\mathbf{X_C}) (\mathbf{P}\mathbf{X_C})^T = \frac{1}{n_S-1} \mathbf{P} \mathbf{X_C} \mathbf{X_C}^T \mathbf{P}^T = \mathbf{P} \mathbf{C_X} \mathbf{P}^T ``` As $\mathbf{C_X}$ is symmetric, it is orthogonal diagonalisable, and can be written $\mathbf{C_X} = \mathbf{E} \mathbf{S} \mathbf{E}^{T}$. In this equation, $\mathbf{E}$ is an orthonormal matrix whose columns are the orthonormal eigenvectors of $\mathbf{C_X}$, and $\mathbf{S}$ is a diagonal matrix which has the eigenvalues of $\mathbf{C_X}$. Given this, if we choose $\mathbf{P}=\mathbf{E}^T$, this leads to ```{math} \mathbf{C_Y} = \mathbf{P} \mathbf{C_X} \mathbf{P}^T = \mathbf{E}^T \mathbf{E} \mathbf{S} \mathbf{E}^{T} \mathbf{E} = \mathbf{S} ``` At this level, there is no unicity of the $\mathbf{S}$ matrix as one can have many permutations of the eigenvalues along the diagonal, as long as one changes $\mathbf{E}$ accordingly. Finally, an interesting link can be drawn between this protocol and a very classical method of linear algebra, already mentioned in other places of this document, called the Singular Value Decomposition (*SVD*[^metho_pca_nutshell_2]) leading to [^metho_pca_nutshell_2]: SVD is applied to matrix whose number of rows should be greater than its number of columns. **{eq}`svd_formula`: General form of a SVD** ```{math} :label: svd_formula \mathbf{X_C}^T = \mathbf{U} \boldsymbol\Sigma \mathbf{V}^{T} \;\; {\rm where} \;\; \mathbf{X_C}^T(n_{S},n_{X}),\;\;\mathbf{U}(n_{S},n_{S}), \;\; \mathbf{V}(n_{X},n_{X}) \;\; {\rm and} \;\; \boldsymbol\Sigma(n_{S},n_{X}) ``` In this context $\mathbf{U}$ and $\mathbf{V}$ are unitary matrices (also known as respectively the *left singular vectors* and *right singular vectors* of $\mathbf{X_C}^T$) while $\boldsymbol\Sigma$ is a diagonal matrix storing the singular values of $\mathbf{X_C}^T$ in decreasing order. The last step is then to state the linear algebra theorem which says that the non-zero singular values of $\mathbf{X_C}^T$ are the square roots of the nonzero eigenvalues of $\mathbf{X_C}\mathbf{X_C}^T$ and $\mathbf{X_C}^T\mathbf{X_C}$ (the corresponding eigenvectors being the columns of respectively $\mathbf{V}$ and $\mathbf{U}$). Gathering all this, one can see that by doing the SVD on the centered original sample matrix, the resulting projection matrix can be identified as $\mathbf{P}=\mathbf{V}^T$ and the resulting covariance matrix will be proportional to $\boldsymbol\Sigma^2$. The final interesting property is coming from the SVD itself: as $\boldsymbol\Sigma^2$ gathers the eigenvalues in decreasing order, it assures the unicity of the transformation and give access to the principal component in a hierarchical way.