Ticket #9277 (assigned)

Opened 7 years ago

Last modified 5 years ago

Distributed Algorithms - Sites other than ORNL

Reported by: Nick Draper Owned by: Federico M Pouzols
Priority: critical Milestone: Release 3.5
Component: Framework Keywords: SSC,2014,All, performance
Cc: Blocked By: #11122, #11123, #11124, #11126, #11392
Blocking: Tester:

Description

Already implemented at ORNL. If other sites can copy the ORNL back end, it's very little work. If they have to implement their own backends, then 3 months is about right. Assumes batch submission, not interactive mode. Any algorithm/launch script modification for specific reductions is additional.

Change History

comment:1 Changed 6 years ago by Nick Draper

  • Priority changed from major to critical

Move all SSC tickets to critical by default

comment:2 Changed 6 years ago by Nick Draper

  • Status changed from new to assigned
  • Owner set to Federico M Pouzols
  • Milestone changed from Release 3.5 to Release 3.4

comment:3 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11122, 11123, 11124, 11126 added

Added a few blocking tickets (needed to make the RemoteJobManager generic to different web services and underlying job control mechanisms).

comment:4 Changed 6 years ago by Nick Draper

  • Keywords SSC,2014,All added; SSC removed

comment:5 Changed 6 years ago by Nick Draper

Batch modify all SSC tickets to critical priority (this will also show up as an update for all those already as critical)

comment:6 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11392 added

comment:7 Changed 6 years ago by Martyn Gigg

Time estimate covering blocking tickets: 2 weeks

comment:8 Changed 5 years ago by Federico M Pouzols

Notes about the status of this ticket:

  • As far as I understand, the infrastructure for this is already in place with the new v2 remote algorithms (that was ticket #11126 and other 'blocking' tickets).
  • We also have specific reduction scripts/GUI for IMAT at ISIS.
  • Remote algorithms work with Platform LSF and the Mantid job submission web service. Any other facility (or Mantid user) using clusters/supercomputers based on either of them should be able to re-use the remote algorithms without much hassle. The only exception is that for Platform LSF outside of STFC, they will most likely need to do a specific implementation of the authenticate method of IRemoteJobManager.
  • Additional schedulers/workload managers would require the addition of new subclasses of IRemoteJobManager. Probably the next one to add would be SLURM, if there is demand for it.

If we are happy with how it stands now, we should probably go ahead with documenting and making this functionality more visible: #11468.

Last edited 5 years ago by Federico M Pouzols (previous) (diff)

comment:9 Changed 5 years ago by Nick Draper

  • Keywords SSC,2014,All, performance added; SSC,2014,All removed

comment:10 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10120

Note: See TracTickets for help on using tickets.