3.2.2.1. Notation convention
Let start by discussing the definition of a correlation matrix that connect (or not) a variable with the others. For a given problem with \(n_{X}\) variables, the covariance between two variables (denoted \({\rm Cov}(X_i,X_j)\)) and their linear correlation (denoted \(\rho_{X_{i}X_{j}}\)) can be estimated as
Equation 3.1: Covariance and correlation between two variables
In the equation above \(\mu\) and \(\sigma\) are respectively the mean and standard deviation of the random variable under consideration. The coefficients that should be provided by the user are the correlation one, called the Pearson ones (as they’ve been estimated using values of the random variables, but this is further discussed at the end of this section and also in Theoretical aspects). The idea is to gather all these coefficients in matrix, called hereafter the correlation matrix, that can be written as
Depending on the reference, one can discuss either the correlation matrix (\(\mathbf{C}\)) or the covariance matrix (\(\mathbf{C_{o}}\)). Going from one to the other is trivial if one defines \(\mathbf{D}\) the diagonal matrix of dimension \(n_{X}\) whose coefficients are the standard deviation of the random variables, then