Running GGCE in parallel¶
Available parallel schemes¶
When using a GGCE solver to obtain the spectral function through a solution of a system of equations, there are two ways in which parallelization might be helpful.
1. When computing the spectral function at a variety of \(k, \omega\) points, parallelizing across those points gives a linear speed-up – because the calculations at different points are fully independent on each other. This is the so-called embarrassingly parallel situation, and we call this parallelization scheme the across-points scheme.
2. In a large variational space (i.e. with large values for phonon_number
and
phonon_extent
), the matrices to be solved at each \(k,\omega\) point can
become large even for sparse solvers. The advantage of sparse solvers is that it is
possible to parallelize the solution of a sparsely-represented matrix across MPI
ranks, speeding up the process (albeit in general sub-linearly). We call this the
matrix-level scheme.
If you have configured mpi4py
, you can directly run standard GGCE Solver classes,
SparseSolver
and DenseSolver
, in parallel according to the first scheme –
across different \(k,\omega\) points.
If you have configured PETSc
, you can actually have both across-points and
matrix-level parallelization simultaneously, by employing groups or brigades of MPI
ranks together. We refer to this combination as the double-parallel scheme
and it is described in more detail in GGCE-PETSc interface: advanced features.
Parallelization primer¶
Two things are needed to enable parallelization within Solver
operation.
First, you need to run the script with an MPI executable. For example, to run a Python GGCE script with MPI, using four processes
mpirun -np 4 python script.py
Alternatively, mpiexec
can be used depending on your cluster’s configuration.
(Locally, these are equivalent.)
Second, we need to provide the Solver
with an MPI
communicator. This object
is wrapped in mpi4py
and can be accessed in the following way
from mpi4py import MPI
COMM = MPI.COMM_WORLD
Note
It is not enough to install mpi4py
to have access to MPI objects and be able
to run MPI parallel calculations. By itself, mpi4py
is merely a wrapper for
actual executables provided by a particular implementation of the MPI standard
such as openmpi
or mpich <https://www.mpich.org/>. Even if a pip
installation of mpi4py
does not throw an error, a good test to make sure that mpi4py
has been
installed and linked properly against an installation of an MPI implementation
are the following two commands one can run in the terminal
python -c "import mpi4py"
python -c "from mpi4py import MPI"
If both these commands execute without errors, you are likely ready to execute MPI calculations.
The object COMM
is an MPI global communicator wrapper. It provides a variety of
methods for interacting with and controlling different MPI processes. For example,
you can get the WORLD_SIZE
– i.e. the total number of processes running the
calculation
print(COMM.Get_size())
In our calculation above with four processes, this would return, predictably
4
4
4
4
Each process executes the print command separately.
We can also print the rank (sequential label) of each MPI process running the script.
print(COMM.Get_rank())
What will this print? Since WORLD_SIZE
is a variable that has the same value
on all MPI processes, the same number is printed. But each process has a different rank,
so different numbers will be printed.
3
1
4
2
The order will not necessarily be sequential: all processes rush to write to the output at once, and contingent on the situation on a given CPU, will get there at different times. This might even change from execution to execution.
Getting the rank of a given process can be useful if in the same
script there are sequential and parallel parts. The easiest way to execute part
of the code sequentially (for example, for printing the results at the end)
is to introduce an if
block
(setting up Models, Systems, Solvers, running parallel calculations)
if COMM.Get_rank() == 0:
(do sequential stuff here)
The == 0
part is convention – usually sequential portions of the code are
reserved for the so-called “head rank” – but could of course be any of the processes.
With this, we are ready for a parallel GGCE script.
Across-points scheme¶
As mentioned above, we import the communicator and pass it to the Solver during instantiation.
from ggce import Model, System, DenseSolver
from mpi4py import MPI
COMM = MPI.COMM_WORLD
mymodel = Model.from_parameters(...)
mysystem = System(mymodel)
mysolver = Solver(system=mysystem, mpi_comm=COMM)
And that’s it! When we run .greens_function()
on some momentum and frequency
arrays, the Solver
class instance will automatically parallelize the calculation
across available ranks. In particular, if we do
results = mysolver.greens_function(kgrid, wgrid, eta = 0.005, pbar = True)
we will see linear speed-up, with the ranks splitting up the work. The progress bar will note this automatically and be proportionally shorter.
One important idiosyncracy of the .greens_function()
method is that only the
head node (rank = 0) returns the result – the others result a pythonic None
.
Subsequent processing of results – such as taking the imaginary part to get the
spectral function – must be restricted to an if COMM.Get_rank() == 0
block
for this reason.
Matrix-level scheme¶
As mentioned at the top of this tutorial, this scheme is not available without PETSc.
The SciPy sparse solver does have some rudimentary multithreading controlled by
the OMP_NUM_THREADS
parameter (see Multithreading SciPy solvers for more details).
See the next tutorial titled GGCE-PETSc interface: introduction about using the matrix-level scheme with PETSc. The advanced double-parallel scheme will be described in GGCE-PETSc interface: advanced features.