Ticket #11126 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Reorganize remote algorithms so that they can use different job managers

Reported by: Federico M Pouzols Owned by: Federico M Pouzols
Priority: major Milestone: Release 3.4
Component: Framework Keywords:
Cc: Blocked By: #11064, #11122, #11123, #11124, #11392
Blocking: #9277, #11361, #11373, #11538 Tester: Martyn Gigg

Description (last modified by Federico M Pouzols) (diff)

The idea is that the remote algorithms should be able to use different web service APIs or underlying mechanisms (ssh, etc.) to control remote jobs on compute resources. Examples: LSF through the IBM PAC (Platform Application Center), or SLURM.

Using the design/diagram on slide 19 of https://github.com/mantidproject/documents/blob/master/Presentations/SOS18/Mantid%20HPC%20Challenges.pptx:

If we:

  • add IRemoteJobManager (#11123)
  • add RemoteJobManagerFactory (#11124)

We could rearrange the code that is specific to the Mantid web service API (http://www.mantidproject.org/Remote_Job_Submission_API) to a class that extends IRemoteJobManager. This class could be called MantidWSAPIJobManager for example. This specific code includes code to submit HTTP requests currently living in RemoteJobManager, and code to process parameters and response codes and messages, currently living in the individual remote algorithms.

This way it would be possible to have support for other web services such as those provided by the SLURM and LSF cluster schedulers / resource managers.

We still need to clarify a few points, but in principle this would require moving code that currently lives in the class RemoteJobManager (HTTP requests), and the remote algorithms (all current implementations, including Authenticate, AbortRemoteJob, StartRemoteTransaction, SubmitRemoteJob, etc.).

Once this is done, remote algorithms (Authenticate, SubmitRemoteJob, etc.) will just rely on methods from IRemoteJobManager. Remote algorithms will need testing at SNS (where they are currently being used in different interfaces with the Fermi cluster). These are the scripts included in the Mantid distribution that currently use remote algorithms:

These scripts are imported/used in the following interfaces:

  • Diffraction -> Powder Diffraction Reduction
  • Direct -> DGS Reduction
  • SANS -> ORNL SANS
  • Reflectometry -> REFL reduction
  • Reflectometry -> REFM reduction

When all this works, it would then be possible to add a new specific RemoteJobManager for tomography jobs (and/or other types) on SCARF: SCARF_LSFRemoteJobManager or similar.

Note: when all this is done, make sure that the algorithms do not use FacilityInfo::getRemoteJobManager() which would be then removed (#11373).

Change History

comment:1 Changed 6 years ago by Federico M Pouzols

  • Description modified (diff)

comment:2 Changed 6 years ago by Federico M Pouzols

  • Blocking 9277 added

(In #9277) Added a few blocking tickets (needed to make the RemoteJobManager generic to different web services and underlying job control mechanisms).

comment:3 Changed 6 years ago by Federico M Pouzols

  • Status changed from new to assigned

comment:4 Changed 6 years ago by Federico M Pouzols

  • Blocking 11361 added

comment:5 Changed 6 years ago by Federico M Pouzols

  • Blocking 11373 added

comment:6 Changed 6 years ago by Federico M Pouzols

  • Description modified (diff)

comment:7 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11392 added

comment:8 Changed 6 years ago by Federico M Pouzols

  • Blocking 11064 removed

comment:9 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11064 added

comment:10 Changed 6 years ago by Federico M Pouzols

  • Summary changed from Reorganize remote algorithms so that they can use different web service APIs to Reorganize remote algorithms so that they can use different job managers

comment:11 Changed 6 years ago by Federico Montesino Pouzols

  • Status changed from assigned to inprogress

new v2 of remote algorithms, re #11126

Changeset: b342af5253445e146626dcd66288f90c1197abdf

comment:12 Changed 6 years ago by Federico Montesino Pouzols

updated v1 tests and added v2 tests, re #11126

Changeset: 6b80a62057f2345c9be5e054b53d264a1404c0d1

comment:13 Changed 6 years ago by Federico Montesino Pouzols

use remote algorithms v1 in reduction gui(s), re #11126

Changeset: 1cdec780e7a634dac78b99d6bd4e09dd5e20cfac

comment:14 Changed 6 years ago by Federico Montesino Pouzols

add rst docs for the v2 remote algorithms, re #11126

Changeset: e237fe28d1d008a8e7f1a91a1b9dcd814e4586f6

comment:15 Changed 6 years ago by Federico Montesino Pouzols

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #525.

comment:16 Changed 5 years ago by Federico Montesino Pouzols

Add note on v1 differences in v2 algorithms rst doc, re #11126

Changeset: 13cbaa18fda653290b06cba681d6c1ec9bc62f9e

comment:17 Changed 5 years ago by Federico M Pouzols

  • Blocking 11538 added

comment:18 Changed 5 years ago by Martyn Gigg

  • Status changed from verify to verifying
  • Tester set to Martyn Gigg

comment:19 Changed 5 years ago by Federico Montesino Pouzols

use v1 also for StartRemoteTransaction, re #11126

Changeset: 7c66bb2256c5f1aa94f0f5adad8cc62dab789480

comment:20 Changed 5 years ago by Martyn Gigg

This all seems okay to me now as the new versions shouldn't affect any existing code.

comment:21 Changed 5 years ago by Martyn Gigg

  • Status changed from verifying to closed

Merge pull request #525 from mantidproject/11126_remote_algorithms_use_different_job_managers

Modify remote algorithms to support different job managers (for Fermi, SCARF, etc.)

Full changeset: 72029d8edbd3fbf9aa6fcbe81892d81bb96fdaec

comment:22 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11965

Note: See TracTickets for help on using tickets.