2.4.4.1. computeQuantile

For a given probability \(p\), the corresponding quantile \(q\) is given by:

\[q = (1-p) x_k + p x_{k+1}\]

where \(x_k\) is the k-Th smallest value of the attribute set-of-value (whose size is \(N\)).

The way \(k\) is computed is discussed later on, as a parameter of the functions.

The implementation and principle has slightly changed in order to be able to cope with vectors (even though the previous logic has been kept for consistency and backward compatibility). Let’s start with an example of the way it was done with the two main methods whose name are the same but differ by their signature.

    double aproba=0.5, aquant=0;
    tdsGeyser->computeQuantile("x2", aproba, aquant); // (1)

    double Proba[2]={0.05,0.95}; double Quant[2]={0,0}; // (2)
    tdsGeyser->computeQuantile("x2", 2, Proba, Quant);

    cout << "Quant[0]=" << Quant[0] << "; aquant=" << aquant << "; Quant[1]=" << Quant[1] << endl; // (3)

Description of the methods and results

This function takes here three mandatory arguments: the attribute name, the value of the chosen probability and a double whose value will be changed in the function to the estimated result.
This function takes here four mandatory arguments: the attribute name, the number \(N_{q}\) of calculation to be done, the values of the chosen probability transmitted as an array of size \(N_{q}\) and another array of size \(N_{q}\) whose value will be changed in the function to the estimated results.
This line shows the results of the three previous computations.

This implementation has been slightly modified for two reasons: to adapt the method to the case of vectors and to store easily the results and prevent from recomputing already existing results. Even though the previous behaviour is still correct, the information is now stored in the attribute itself, as a vector of map. For every element of a vector, a map of format map<double,double> is created: the first double is the key, meaning the value of probability provided by the user, while the second double is the results. It is now highly recommended to use the method of the TAttribute, that gives access to these maps for two reasons: the results provided by the methods detailed previously are only correct for the last element of a vector, and the vector of map just discussed here is cleared as soon as the general selection is modified (as for the elementary statistical-vectors discussed in Computing the elementary statistic). The next example uses the following input file, named aTDSWithVectors.dat:

#NAME: cho
#COLUMN_NAMES: x|rank
#COLUMN_TYPES: D|V

0 0,1
1 2,3
2 4,5
3 6,7
4 8,9

From this file, the following code (that can be find in Macro “dataserverComputeQuantileVec.C” shows the different methods created in the attribute class in order for the user to get back the computed values:

    TDataServer *tdsvec = new TDataServer("foo", "bar");
    tdsvec->fileDataRead("aTDSWithVectors.dat");

    double probas[3]={0.2, 0.6, 0.8}; double quants[3];
    tdsvec->computeQuantile("rank", 3, probas, quants);

    TAttribute *prank = tdsvec->getAttribute("rank");
    int nbquant;
    prank->getQuantilesSize(nbquant); // (1)
    cout << "nbquant = " << nbquant << endl;

    double aproba=0.8; double aquant;
    prank->getQuantile(aproba, aquant); // (2)
    cout << "aproba = " << aproba << ", aquant = " <<
    aquant << endl;

    double theproba[3], thequant[3];
    prank->getQuantiles(theproba, thequant); // (3)
    for(int i_quant=0; i_quant<nbquant; ++i_quant) {
        cout << "(theproba, thequant)[" << i_quant << "] = "
        << "(" << theproba[i_quant] << ", " <<
        thequant[i_quant] << ")" << endl;
    }

    vector<double> allquant;
    prank->getQuantileVector(aproba, allquant); // (4)
    cout << "aproba = " << aproba << ", allquant = ";
    for(double quant_i: allquant)
        cout << quant_i << " ";
    cout << endl;

Description of the methods and results

This method changes the value of nbquant to the number of already computed and stored values of quantiles. A second argument can be provided to state which element of the vector is concerned (if the attribute under study is a vector, the default value being 0).
This method changes the value of aquant to the quantile value corresponding to a given probability aproba. A second argument can be provided to state which element of the vector is concerned (if the attribute under study is a vector, the default value being 0).
As previously, this method changes the values of thequant to the quantile values corresponding to given probabilities stores in theproba. A second argument can be provided to state which element of the vector is concerned (if the attribute under study is a vector, the default value being 0). Warning: the size of both arrays has to be carefully set. It is recommended to use the getQuantilesSize method ahead of this one.
This method fills the provided vector allquant with the quantile value of all element of the attribute under study corresponding to a given probability aproba.

The results of this example are shown below:

nbquant = 3
aproba = 0.8, aquant = 6.4
(theproba, thequant)[0] = (0.2, 1.6)
(theproba, thequant)[1] = (0.6, 4.8)
(theproba, thequant)[2] = (0.8, 6.4)
aproba = 0.8, allquant = 6.4 7.4 

Summary: computeQuantile

computeQuantile(const char* attName, Double_t proba, Double_t& quantile, Int_t type = 7);
computeQuantile(const char* attName, Int_t nProba, Double_t* proba, Double_t* quantile, Int_t type = 7);

The methods are discussed above. The last parameter determines how \(k\) is computed. For discontinuous cases:
1. \(k=\lfloor p\times N \rfloor; \; {\rm if} \; p \times N = k,\; q = x_k. \; q=x_{k+1} \; {\rm otherwise.}\)
2. \(k=\lfloor p\times N \rfloor; \; {\rm if} \; p \times N = k,\; q = 1/2 \times (x_k+x_{k+1}). \; q=x_{k+1} \; {\rm otherwise.}\)
3. \(k=\lfloor p \times N - 0.5 \rfloor; \; {\rm if} \; p \times N -0.5 = k \; {\rm and} \; k \; {\rm is \; even},\; q = x_k. \; q=x_{k+1} \; {\rm otherwise.}\), default in SAS.
For piece-wise linear interpolations:
1. \(k=\lfloor p \times N\rfloor\)
2. \(k=\lfloor p \times N - 0.5\rfloor\)
3. \(k=\lfloor p \times (N + 1) \rfloor\), default in Minitab and SPSS.
4. \(k=\lfloor p \times (N - 1) + 1 \rfloor\), default in ROOT, S and R.
5. \(k=\lfloor p \times (N + 1/3) + 1/3 \rfloor\), approximately median unbiased.
6. \(k=\lfloor p \times (N + 1/4) + 3/8 \rfloor\), approximately unbiased if \(x\) is normally distributed.