Ticket #3076 (closed: fixed)

Opened 9 years ago

Last modified 5 years ago

Prototype the capability to run Mantid jobs under MPI

Reported by: Russell Taylor Owned by: Russell Taylor
Priority: critical Milestone: Release 2.0
Component: Mantid Keywords:
Cc: Blocked By: #2663
Blocking: Tester: Ronald Fowler

Description (last modified by Ronald Fowler) (diff)

The MPI implementation will only build successfully on recent versions of Linux such as RHEL6 and Ubuntu. While building on RHEL5 should be possible with the correct boost libraries these are difficult to obtain. The code was not built on the local RHEL6 build server due to problems with the set up of the package repositories. Instead it built successfully on a Ubuntu system. The two example algorithms provided allow for broadcast of a workspace to all MPI group members and the gathering of a distributed set of workspaces into a signle workspace. The tests that are implemented check the operation of the algorithms rather than the MPI performance directly, since this is difficult to do in the normal test framework. Instead a pair of scripts are provided that run tests in Python that make use of the two C++ algorithms. The first one implments a focus operation on a set of data files with each MPI processor dealing with one instrument bank. The results are gathered using GatherWorkspaces. The second example is also based on the focus operation but uses both Broadcast and GatherWorkspaces operations, in a way that allows maximum parallelism of the normalisation operation. The scripts depend on an additional Python library (boostmpi) that allows direct access to MPI calls from Python. This is very useful, but the package is no longer supported and installation under Ubuntu is not as easy as it might be due to package version changes. If the set of MPI functionality required in Python is small it might be better to implement these as C++ algorithms to avoid dependence on another package. Though I can build the MPI version of Mantid and install the additional Python library, I have not been able to run the test scripts due to unrelated problems with my Ubuntu installation. More complete testing will be possible when a fully configured RHEL6 server is available.

Change History

comment:1 Changed 9 years ago by Russell Taylor

  • Status changed from new to accepted

comment:2 Changed 9 years ago by Russell Taylor

(In [12014]) Add option to find, link against and initialise MPI in Framework. Uses boost mpi with OpenMPI underneath. Off by default of course. Re #3076.

comment:3 Changed 9 years ago by Russell Taylor

  • Blocked By 2663 added

(In #2663) The existence of the file can cause problems when using MPI - when multiple processes can be running on the same machine with the same owner (and all try to write the file more or less simultaneously).

comment:4 Changed 9 years ago by Russell Taylor

(In [12146]) Add initial MPI algorithms and example script, which takes half the time of a 'linear, looping' job on my machine. Re #3076.

comment:5 Changed 9 years ago by Russell Taylor

(In [12147]) Forgot to commit these clean-ups. Re #3076.

comment:6 Changed 9 years ago by Russell Taylor

(In [12596]) Checkpoint NOMAD MPI example. Re #3076.

comment:7 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 29 to Iteration 30

Accepted and assigned tickets moved at iteration 29 code freeze

comment:8 Changed 9 years ago by Russell Taylor

In [13304]:

Make it possible to create a 'mantid-mpi' rpm (Framework only). Re #3076.

comment:9 Changed 9 years ago by Russell Taylor

In [13656]:

Make sure script output is a Workspace2D. Re #3076.

comment:10 Changed 9 years ago by Nick Draper

  • Milestone changed from Iteration 30 to Iteration 31

Bulk move of tickets to iteration 31 at the iteration 30 code freeze

comment:11 Changed 9 years ago by Russell Taylor

In [15266]:

Fix MPI build. Re #3076.

comment:12 Changed 9 years ago by Russell Taylor

Fix MPI unit test. Re #3076.

Changeset: dff2067b7cfaac60ce11d7319e6718fb2e4a518a

comment:13 Changed 9 years ago by Russell Taylor

Fix MPI unit test. Re #3076.

Changeset: dff2067b7cfaac60ce11d7319e6718fb2e4a518a

comment:14 Changed 9 years ago by Russell Taylor

  • Summary changed from Add capability to run Mantid jobs under MPI to Prototype the capability to run Mantid jobs under MPI

comment:15 Changed 9 years ago by Russell Taylor

  • Status changed from accepted to verify
  • Resolution set to fixed

Calling this done as a prototype. Anyone who's minded to test it will certainly need to get in touch. Having said that, none of the code here finds its way into a 'regular' build of Mantid so it's pretty irrelevant with respect to the release.

comment:16 Changed 9 years ago by Ronald Fowler

  • Status changed from verify to verifying
  • Tester set to Ronald Fowler

comment:17 Changed 9 years ago by Ronald Fowler

  • Status changed from verifying to closed
  • Description modified (diff)

comment:18 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 3923

Note: See TracTickets for help on using tickets.