.. _parallel: ======================== Running GGCE in parallel ======================== Available parallel schemes -------------------------- When using a GGCE solver to obtain the spectral function through a solution of a system of equations, there are two ways in which parallelization might be helpful. 1. When computing the spectral function at a variety of :math:`k, \omega` points, parallelizing across those points gives a linear speed-up -- because the calculations at different points are fully independent on each other. This is the so-called *embarrassingly parallel* situation, and we call this parallelization scheme the **across-points scheme**. 2. In a large variational space (i.e. with large values for ``phonon_number`` and ``phonon_extent``), the matrices to be solved at each :math:`k,\omega` point can become large even for sparse solvers. The advantage of sparse solvers is that it is possible to parallelize the solution of a sparsely-represented matrix across MPI ranks, speeding up the process (albeit in general sub-linearly). We call this the **matrix-level** scheme. If you have configured ``mpi4py``, you can directly run standard GGCE Solver classes, ``SparseSolver`` and ``DenseSolver``, in parallel according to the first scheme -- across different :math:`k,\omega` points. If you have configured ``PETSc``, you can actually have both across-points and matrix-level parallelization simultaneously, by employing groups or *brigades* of MPI ranks together. We refer to this combination as the *double-parallel* scheme and it is described in more detail in :ref:`petsc-advanced`. Parallelization primer ---------------------- Two things are needed to enable parallelization within ``Solver`` operation. First, you need to run the script with an MPI executable. For example, to run a Python GGCE script with MPI, using four processes .. code-block:: bash mpirun -np 4 python script.py Alternatively, ``mpiexec`` can be used depending on your cluster's configuration. (Locally, these are equivalent.) Second, we need to provide the ``Solver`` with an ``MPI`` communicator. This object is wrapped in ``mpi4py`` and can be accessed in the following way .. code-block:: python from mpi4py import MPI COMM = MPI.COMM_WORLD .. note:: It is not enough to install ``mpi4py`` to have access to MPI objects and be able to run MPI parallel calculations. By itself, ``mpi4py`` is merely a wrapper for actual executables provided by a particular implementation of the MPI standard such as `openmpi `__ or `mpich `. Even if a ``pip`` installation of ``mpi4py`` does not throw an error, a good test to make sure that ``mpi4py`` has been installed and linked properly against an installation of an MPI implementation are the following two commands one can run in the terminal .. code-block:: bash python -c "import mpi4py" python -c "from mpi4py import MPI" If both these commands execute without errors, you are likely ready to execute MPI calculations. The object ``COMM`` is an MPI global communicator wrapper. It provides a variety of methods for interacting with and controlling different MPI processes. For example, you can get the ``WORLD_SIZE`` -- i.e. the total number of processes running the calculation .. code-block:: python print(COMM.Get_size()) In our calculation above with four processes, this would return, predictably .. code-block:: bash 4 4 4 4 Each process executes the print command separately. We can also print the rank (sequential label) of each MPI process running the script. .. code-block:: python print(COMM.Get_rank()) What will this print? Since ``WORLD_SIZE`` is a variable that has the same value on all MPI processes, the same number is printed. But each process has a different rank, so different numbers will be printed. .. code-block:: bash 3 1 4 2 The order will not necessarily be sequential: all processes rush to write to the output at once, and contingent on the situation on a given CPU, will get there at different times. This might even change from execution to execution. Getting the rank of a given process can be useful if in the same script there are sequential and parallel parts. The easiest way to execute part of the code sequentially (for example, for printing the results at the end) is to introduce an ``if`` block .. code-block:: python (setting up Models, Systems, Solvers, running parallel calculations) if COMM.Get_rank() == 0: (do sequential stuff here) The ``== 0`` part is convention -- usually sequential portions of the code are reserved for the so-called "head rank" -- but could of course be any of the processes. With this, we are ready for a parallel GGCE script. Across-points scheme -------------------- As mentioned above, we import the communicator and pass it to the Solver during instantiation. .. code-block:: python from ggce import Model, System, DenseSolver from mpi4py import MPI COMM = MPI.COMM_WORLD mymodel = Model.from_parameters(...) mysystem = System(mymodel) mysolver = Solver(system=mysystem, mpi_comm=COMM) And that's it! When we run ``.greens_function()`` on some momentum and frequency arrays, the ``Solver`` class instance will automatically parallelize the calculation across available ranks. In particular, if we do .. code-block:: python results = mysolver.greens_function(kgrid, wgrid, eta = 0.005, pbar = True) we will see linear speed-up, with the ranks splitting up the work. The progress bar will note this automatically and be proportionally shorter. One important idiosyncracy of the ``.greens_function()`` method is that only the head node (rank = 0) returns the result -- the others result a pythonic ``None``. Subsequent processing of results -- such as taking the imaginary part to get the spectral function -- must be restricted to an ``if COMM.Get_rank() == 0`` block for this reason. Matrix-level scheme ------------------- As mentioned at the top of this tutorial, this scheme is not available without PETSc. The SciPy sparse solver does have some rudimentary multithreading controlled by the ``OMP_NUM_THREADS`` parameter (see :ref:`Multithreading SciPy solvers ` for more details). See the next tutorial titled :ref:`petsc-intro` about using the matrix-level scheme with PETSc. The advanced double-parallel scheme will be described in :ref:`petsc-advanced`.