2.4.2. Computing the ranking

The ranking of variable is used in many methods that are focusing more on monotony than on linearity (this is discussed throughout this documentation when coping with regression, correlation matrix, …). The way this is done in Uranie is the following: for every attribute considered, (which means all attributes by default if the function is called without argument) a new attribute is created, whose name is constructed as the name of the considered attribute with the prefix “Rk_”. The ranking consists, for a simple double-precision attribute, in assigning to each attribute entry an integer, that goes from 1 to the number of patterns, following an order relation (in Uranie it is chosen so that 1 is the smallest value and \(N\) is the largest one).

This method has been modified in order to cope with constant size vectors, but also to stabilise its behaviour when going from one compiler version to another. The first modification only consists in considering every element of a constant-size vector independent from the others, so every element is in fact treated as if they were different attributes. The second part is more technical as the sorting method has been changed to use the std::stable_sort insuring that platforms (operating systems and compiler versions) will have the same behaviour. The main problem was raising when two patterns had the same value for the attribute under study. In this case, the ranking was not done in the same way depending on the version of the compiler. Now it should be treated in the same way: if two or more patterns have the same value for a specific attribute, the first met in the array of attribute value will have the value \(i\) while the second one will be affected with \(i+1\) and so on… Here is a small example of this computation:

"""
Example of rank usage for illustration purpose
"""
from URANIE import DataServer

tdsGeyser = DataServer.TDataServer("geyser", "poet")
tdsGeyser.fileDataRead("geyser.dat")
tdsGeyser.computeRank("x1")
tdsGeyser.computeStatistic("Rk_x1")

print("NPatterns="+str(tdsGeyser.getNPatterns())+";  min(Rk_x1)= " +
      str(tdsGeyser.getAttribute("Rk_x1").getMinimum())+";  max(Rk_x1)= " +
      str(tdsGeyser.getAttribute("Rk_x1").getMaximum()))

This macro should returns

NPatterns=272;  min(Rk_x1)= 1.0;  max(Rk_x1)= 272.0

Summary: computeRank

  • computeRank(const char* varexp=”*”, option* option)

    Create a new attribute for every attribute requested (or for all attributes if no argument is provided)

    String-type and non-constant-vector-type attribute are disregarded and a warning is shown to let the user know.