2.4.1. Normalising the variable

The normalisation function normalize can be called to create new attributes whose range and dispersion depend on the chosen normalisation method. This function can be called without argument but also using up to four ones (the list of which is given in the summary below). Up to now, there are four different ways to perform this normalisation:

  • centered-reducted (enum value kCR): the new variable values are computed as \(\tilde{x} = \dfrac{x - \mu_{x}}{\sigma_{x}}\)

  • centered (enum value kCentered): the new variable values are computed as \(\tilde{x} = x - \mu_{x}\)

  • reduced to \([-1,1]\) (enum value kMinusOneOne): the new variable values are computed as \(\tilde{x} = 2.0 \times \dfrac{ x - x_{\rm Min}}{x_{\rm Max} - x_{\rm Min}} -1.0\)

  • reduced to \([0,1]\) (enum value kZeroOne): the new variable values are computed as \(\tilde{x} = \dfrac{ x - x_{\rm Min}}{x_{\rm Max} - x_{\rm Min}}\)

The following piece of code shows how to use this function on a very simple dataserver, focusing on a vector whose values goes from 1 to 9 over three events.

    TDataServer *tdsop =new TDataServer("foo","pouet");
    tdsop->fileDataRead("tdstest.dat");

    //Compute a global normalisation of v, CenterReduced
    tdsop->normalize("v","GCR",TDataServer::kCR,true);
    //Compute a normalisation of v, CenterReduced (not global but entry by entry)
    tdsop->normalize("v","CR",TDataServer::kCR,false);

    //Compute a global normalisation of v, Centered
    tdsop->normalize("v","GCent",TDataServer::kCentered);
    //Compute a normalisation of v, Centered  (not global but entry by entry)
    tdsop->normalize("v","Cent",TDataServer::kCentered,false);

    //Compute a global normalisation of v, ZeroOne
    tdsop->normalize("v","GZO",TDataServer::kZeroOne);
    //Compute a normalisation of v, ZeroOne (not global but entry by entry)
    tdsop->normalize("v","ZO",TDataServer::kZeroOne,false);

    //Compute a global normalisation of v, MinusOneOne
    tdsop->normalize("v","GMOO",TDataServer::kMinusOneOne,true);
    //Compute a normalisation of v, MinusOneOne (not global but entry by entry)
    tdsop->normalize("v","MOO",TDataServer::kMinusOneOne,false);

    tdsop->scan("v:vGCR:vCR:vGCent:vCent:vGZO:vZO:vGMOO:vMOO","","colsize=4 col=2:5::::::::");

The normalisation is performed using all methods, first with the global flag set to true (the suffix always starts with ā€œGā€ for global) and then with the more local approach. The result of the scan method is given below:

*************************************************************************************
*    Row   * Instance *  v *  vGCR *  vCR * vGCe * vCen * vGZO *  vZO * vGMO * vMOO *
*************************************************************************************
*        0 *        0 *  1 * -1.46 *   -1 *   -4 *   -3 *    0 *    0 *   -1 *   -1 *
*        0 *        1 *  2 * -1.09 *   -1 *   -3 *   -3 * 0.12 *    0 * -0.7 *   -1 *
*        0 *        2 *  3 * -0.73 *   -1 *   -2 *   -3 * 0.25 *    0 * -0.5 *   -1 *
*        1 *        0 *  4 * -0.36 *    0 *   -1 *    0 * 0.37 *  0.5 * -0.2 *    0 *
*        1 *        1 *  5 *     0 *    0 *    0 *    0 *  0.5 *  0.5 *    0 *    0 *
*        1 *        2 *  6 * 0.365 *    0 *    1 *    0 * 0.62 *  0.5 * 0.25 *    0 *
*        2 *        0 *  7 * 0.730 *    1 *    2 *    3 * 0.75 *    1 *  0.5 *    1 *
*        2 *        1 *  8 * 1.095 *    1 *    3 *    3 * 0.87 *    1 * 0.75 *    1 *
*        2 *        2 *  9 * 1.460 *    1 *    4 *    3 *    1 *    1 *    1 *    1 *
*************************************************************************************

Summary: normalize

The method is normalize(const char* varexp="", const char* suffix="_CR", ENormalisation method=kCR,  bool global=true) and is adapted to deal with constant-size vectors. It creates a new attribute for every attribute concerned by the call and can be called with 0 to 4 arguments;

  • const char* varexp="": the first argument is the list of attributes on which the normalisation is applied. Left as it is, all attributes will be read and transformed in a new set of attributes.

  • const char* suffix="_CR": the second argument describes the suffix that will be added to the attribute name to obtain the new normalised attribute name.

  • ENormalisation method=kCR: the third argument is an enumerator that describes the method chosen to perform the normalisation (the list of which is provided above)

  • bool global=true: the fourth argument is only useful in the case where the attribute is a vector. In this case one can consider two ways of normalising the entries (despite the chosen method): either normalise every iteration of the constant size vector with respect to the same iteration in other events, without considering the entirety of the vector (meaning the other iterations of the vector), or normalise the entries considering that all entries of a vector are a part of a global pull of number which can be described by one mean and standard deviation. The former case corresponds to global equal to false while the latter is the opposite (and default). This is possible thanks to the modification done on the method performing the statistical treatment