Running GGCE in parallel

Available parallel schemes

When using a GGCE solver to obtain the spectral function through a solution of a system of equations, there are two ways in which parallelization might be helpful.

1. When computing the spectral function at a variety of \(k, \omega\) points, parallelizing across those points gives a linear speed-up – because the calculations at different points are fully independent on each other. This is the so-called embarrassingly parallel situation, and we call this parallelization scheme the across-points scheme.

2. In a large variational space (i.e. with large values for phonon_number and phonon_extent), the matrices to be solved at each \(k,\omega\) point can become large even for sparse solvers. The advantage of sparse solvers is that it is possible to parallelize the solution of a sparsely-represented matrix across MPI ranks, speeding up the process (albeit in general sub-linearly). We call this the matrix-level scheme.

If you have configured mpi4py, you can directly run standard GGCE Solver classes, SparseSolver and DenseSolver, in parallel according to the first scheme – across different \(k,\omega\) points.

If you have configured PETSc, you can actually have both across-points and matrix-level parallelization simultaneously, by employing groups or brigades of MPI ranks together. We refer to this combination as the double-parallel scheme and it is described in more detail in GGCE-PETSc interface: advanced features.

Parallelization primer

Two things are needed to enable parallelization within Solver operation.

First, you need to run the script with an MPI executable. For example, to run a Python GGCE script with MPI, using four processes

mpirun -np 4 python script.py

Alternatively, mpiexec can be used depending on your cluster’s configuration. (Locally, these are equivalent.)

Second, we need to provide the Solver with an MPI communicator. This object is wrapped in mpi4py and can be accessed in the following way

from mpi4py import MPI

COMM = MPI.COMM_WORLD

Note

It is not enough to install mpi4py to have access to MPI objects and be able to run MPI parallel calculations. By itself, mpi4py is merely a wrapper for actual executables provided by a particular implementation of the MPI standard such as openmpi or mpich <https://www.mpich.org/>. Even if a pip installation of mpi4py does not throw an error, a good test to make sure that mpi4py has been installed and linked properly against an installation of an MPI implementation are the following two commands one can run in the terminal

python -c "import mpi4py"

python -c "from mpi4py import MPI"

If both these commands execute without errors, you are likely ready to execute MPI calculations.

The object COMM is an MPI global communicator wrapper. It provides a variety of methods for interacting with and controlling different MPI processes. For example, you can get the WORLD_SIZE – i.e. the total number of processes running the calculation

print(COMM.Get_size())

In our calculation above with four processes, this would return, predictably

4
4
4
4

Each process executes the print command separately.

We can also print the rank (sequential label) of each MPI process running the script.

print(COMM.Get_rank())

What will this print? Since WORLD_SIZE is a variable that has the same value on all MPI processes, the same number is printed. But each process has a different rank, so different numbers will be printed.

3
1
4
2

The order will not necessarily be sequential: all processes rush to write to the output at once, and contingent on the situation on a given CPU, will get there at different times. This might even change from execution to execution.

Getting the rank of a given process can be useful if in the same script there are sequential and parallel parts. The easiest way to execute part of the code sequentially (for example, for printing the results at the end) is to introduce an if block

(setting up Models, Systems, Solvers, running parallel calculations)

if COMM.Get_rank() == 0:
  (do sequential stuff here)

The == 0 part is convention – usually sequential portions of the code are reserved for the so-called “head rank” – but could of course be any of the processes.

With this, we are ready for a parallel GGCE script.

Across-points scheme

As mentioned above, we import the communicator and pass it to the Solver during instantiation.

from ggce import Model, System, DenseSolver
from mpi4py import MPI

COMM = MPI.COMM_WORLD

mymodel = Model.from_parameters(...)
mysystem = System(mymodel)
mysolver = Solver(system=mysystem, mpi_comm=COMM)

And that’s it! When we run .greens_function() on some momentum and frequency arrays, the Solver class instance will automatically parallelize the calculation across available ranks. In particular, if we do

results = mysolver.greens_function(kgrid, wgrid, eta = 0.005, pbar = True)

we will see linear speed-up, with the ranks splitting up the work. The progress bar will note this automatically and be proportionally shorter.

One important idiosyncracy of the .greens_function() method is that only the head node (rank = 0) returns the result – the others result a pythonic None. Subsequent processing of results – such as taking the imaginary part to get the spectral function – must be restricted to an if COMM.Get_rank() == 0 block for this reason.

Matrix-level scheme

As mentioned at the top of this tutorial, this scheme is not available without PETSc. The SciPy sparse solver does have some rudimentary multithreading controlled by the OMP_NUM_THREADS parameter (see Multithreading SciPy solvers for more details).

See the next tutorial titled GGCE-PETSc interface: introduction about using the matrix-level scheme with PETSc. The advanced double-parallel scheme will be described in GGCE-PETSc interface: advanced features.