# Macro "**reoptimizeZoneBiSubMpi.{{extension}}**" ## Objective The objective of the macro is to show an example of a two level parallelism program using the Mpi paradigm. - At the top level, an optimization loop parallelizes its evaluations - At low level, each optimizer evaluation are a launcher loop who parallelizes its own sequential evaluations These example is inspired from a zoning problem of a small plant core with square assemblies. However, the physics embeded in it is reduced to none (sorry), and the problem is simplified. With symetries, the core is defined by 10 different assemblies presented on the following figure. For production purpose, only 5 assembly types are allowed, defined by an emission value. {{ "```{figure} " + parent_dir + "/usermanual/use_cases/figures/uranie_zoning.png\n" + ":align: center\n" + ":name: use_cases_reopzoning_figure\n" + figure_scale + "\n" + "\n" + "The core and its assemblies\n" + "```" }} To simplify the problem, some constraints are put : - most assemblies belong to a default zone - other zone is restricted to one assembly (or two for 4 and 5, and for 8 and 9 for symetrical reason) - one zone is imposed with the 8th et 9th external assemblies. - the total of assembly emission is defined. For each assembly, a reception value is defined depending on the emission from itself and its neighbour's (just 8 neightbours are taken in account, the 4 nearest neighbours and 4 secondary neighbours). The global objective is to minimize the difference between the biggest and the smallest reception value. Optimisation works on 4 emission values (the fifth value, affected to the external zone, is set, and all values are normalized with the total emission value) and each evaluation loops over the 35 possible arrangements (choose 3 zones from 7). A single evaluation take emission values and the selected zones and return the maximum reception difference. ## Macro {{uranie}} This macro is splited in 2 files : the first one defines the low level evaluation function and is reused in the next reoptimizer example. It is quite a mock function, and is given to be complete, but is not needed to understand how to implement the two level MPI parallelism {{ "```{literalinclude} " + parent_dir + "/roottest/uranie/doc/reoptimizer/use_cases/" + language + "/reoptimizeZoneCore." + extension + "\n" + ":language: " + language + "\n" + "```" }} The `lowfun` function deals, as expected, with the low level evaluation. In inputs it has the 4 emission values (default, zone1, zone2, zone3) and 3 indicators defining the zone affected by the extra emission value. It returns the maximal difference between two zone reception values and the 9 normalized emission values (informative data). Two arrays are used to define the neighbourhood With the second file, the two level MPI parallelism is defined. {{ "```{literalinclude} " + parent_dir + "/roottest/uranie/doc/reoptimizer/use_cases/" + language + "/reoptimizeZoneBiSubMpi." + extension + "\n" + ":language: " + language + "\n" + "```" }} This script is structured with 3 functions : - function `tds_resume` is used by the intermediate function. It receives the `TDataServer` filled, loops on its items and returns an synthetic value. In our case, the minimum value of the reception difference, and the 9 normalized emission values - function `doefun` is the intermediate evaluation function. It runs the design of experiments containing all 35 possible arrangements and extract the best one. It receives the 4 emission values and used them to complete the {{tds}} using the `addConstantValue` method. - function `reoptimizeZoneBiSubMpi` is the top level function who solve the zoning problem `TBiMpiRun` and `TSubMpiRun` are used to allocate cpus between intermediate and low level. `TBiMpiRun` is used in `reoptimizeZoneBiSubMpi` (top) with an integer argument specifying the number of CPUs dedicated to each intermediate level. In our case (3), with 16 resources request to MPI, they are divided in 5 groups of 3 CPUs, and one CPU is left for the top level master (take care that the number of CPUs requested matches group size (16 % 3 == 1)). The top level Master sees 5 resources for his evaluations. `TSubMpiRun` is used in `doefun` function and gives access to the 3 own resources reserved in top level function. Running the script is done as usual with MPI : ````{only} cpp ```cpp mpirun -n 16 root -l -b -q reoptimizeZoneBiSubMpi.C ``` ```` ````{only} py ```python mpirun -n 16 python reoptimizeZoneBiSubMpi.py ``` ```` At the begining of `reoptimizeZoneBiSubMpi` function there is a call to `ROOT::EnableThreadSafety`. It is unusefull in this case, but if we parallelize with threads instead of MPI. If you want to use both threads and MPI, it is recommended to use MPI at top level.