Ticket #11064 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Fit the SCARF job scheduler remote algorithm in the 'RemoteJobManager' design

Reported by: Federico M Pouzols Owned by: Federico M Pouzols
Priority: major Milestone: Release 3.4
Component: Framework Keywords:
Cc: nick.draper@… Blocked By: #10591, #11122, #11123, #11124
Blocking: #11126, #11361 Tester: Roman Tolchenov

Description (last modified by Federico M Pouzols) (diff)

Ticket #10591 develops an algorithm to control jobs on SCARF remotely. This is being done for IMAT tomography but can be used in many other contexts.

A so-called-by-me 'RemoteJobManager' design was started some time ago for this very type of algorithms, and the 'AbortRemoteJob', 'Authenticate', 'DownloadRemoteFile', etc. algorithms were implemented and have been used for the Fermi cluster at ORNL.

This 'RemoteJobManager' design is explained/specified on:

The objective here is to do the SCARF remote job control following this design.

The implementation of the algorithms 'AbortRemoteJob' etc. seems to be a bit specific to Fermi at the moment, so this ticket will most likely require another ticket to move specifics to a IRemoteJobManager subclass, while ensuring that the use of Fermi at ORNL is not affected by our changes.

So this ticket will add a remote job manager that implements the IRemoteJobManager interface, something like SCARFLSFJobManager.

The unit test RemoteJobManagerFactoryTest (introduced in #11124) should also be updated to test creation of this SCARFLSFJobManager type of job manager.

Change History

comment:1 Changed 6 years ago by Federico M Pouzols

  • Status changed from new to assigned
  • Blocked By 10591 added

comment:2 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11122 added

comment:3 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11123 added

comment:4 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11124 added

comment:5 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11126 added

comment:6 Changed 6 years ago by Federico M Pouzols

  • Blocking 11361 added

comment:7 Changed 6 years ago by Federico M Pouzols

  • Description modified (diff)

comment:8 Changed 6 years ago by Federico M Pouzols

Now that the IRemoteJobManager interface and the factory are in master, this ticket can get in before #11126, and #11126 would be the very last step, needing verification at SNS.

comment:9 Changed 6 years ago by Federico M Pouzols

  • Blocked By 11126 removed

comment:10 Changed 6 years ago by Federico M Pouzols

  • Blocking 11126 added

comment:11 Changed 6 years ago by Federico Montesino Pouzols

  • Status changed from assigned to inprogress

added new LSF and SCARFLSF job managers, re #11064

Changeset: 85a97c6b49a26fd8eb6a69804c93b34398ce3f14

comment:12 Changed 6 years ago by Federico Montesino Pouzols

add new RemoteJobManagers subdir, re #11064

Changeset: d49d7f17cd40c497eaea9471a53771c01c68b25e

comment:13 Changed 6 years ago by Federico Montesino Pouzols

use the dynamic factory create, more verbose msgs, re #11064

Changeset: 2e062376ab04f7ac5d4b99ba69708d3b01c33fff

comment:14 Changed 6 years ago by Federico Montesino Pouzols

extent tests, and move out bits specific to SCARFLSF, re #11064

Changeset: eb3c515afa417938daffb11f2ff87f231955c861

comment:15 Changed 6 years ago by Federico Montesino Pouzols

update clean-up of LSFJobManager, new minimal test, re #11064

Changeset: 76084245c5717fcb92f11c3dae9f5c4c3b250199

comment:16 Changed 6 years ago by Federico Montesino Pouzols

clean-up SCARFLSF, add ping() and logout(), add tests, re #11064

Changeset: e531821c414b7c412a3a81eed0ff1516417d35ad

comment:17 Changed 6 years ago by Federico Montesino Pouzols

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #503.

comment:18 Changed 6 years ago by Federico Montesino Pouzols

Merge remote-tracking branch 'origin/master' into 11064_add_LSFJobManger_and_SCARFLSFJobManager

Conflicts:

Code/Mantid/Framework/CMakeLists.txt

Sort out conflict with removal of MDEvents, re #11064

Changeset: 64d55c05546ff767eedc0b0bce9168c68ca5fcf2

comment:19 Changed 6 years ago by Federico Montesino Pouzols

Jenkins, be nice and retest this please

comment:20 Changed 6 years ago by Federico Montesino Pouzols

add namespace for rhel7, re #11064

Changeset: e07d8f8796ff824127e0b19cb1bae9bba25c1967

comment:21 Changed 6 years ago by Federico Montesino Pouzols

fill in the requested name when empty, cppcheck caught it, re #11064

Changeset: e88bc0d01e753b2046e53ac3f45f078f86a1debe

comment:22 Changed 6 years ago by Federico Montesino Pouzols

be more verbose and use exception object.what(), re #11064

Changeset: ec68a4c17ff706ef7de147cb1762eba2b894d7f5

comment:23 Changed 6 years ago by Federico Montesino Pouzols

It seems that this is more or less fine, but we got a PoldiCreatePeaksFromFileTest failure on osx and a checkout issue on rhel7. Jenkins, retest this please

comment:24 Changed 6 years ago by Federico Montesino Pouzols

Jenkins, you can do it, retest this please

comment:25 Changed 6 years ago by Roman Tolchenov

  • Status changed from verify to verifying
  • Tester set to Roman Tolchenov

comment:26 Changed 6 years ago by Federico Montesino Pouzols

many code improvements suggested by Roman in the PR re #11064, re

Changeset: e3719361e8dac45be0a90290a1b1c2c6195fe4f1

comment:27 Changed 6 years ago by Federico Montesino Pouzols

avoid UNUSED_ARG, and prefer TSM_ over TS_ASSERT, re #11064

Changeset: 8626f6f885699678004cfb51346728584ce71917

comment:28 Changed 6 years ago by Federico Montesino Pouzols

adjust new tests with proper exception, re #11064

Changeset: 4e1e937ff7c2de8109d84bf250841a4176bcafa7

comment:29 Changed 6 years ago by Federico Montesino Pouzols

I've pushed changes for many of your comments and some other improvements to the unit tests that Owen suggested in another PR. A few changes, including the Poco URI and Path points, are still missing. I'll let you know and remove the "in progress" label when this is finished.

comment:30 Changed 6 years ago by Federico Montesino Pouzols

avoid unnecessary string/c_str casts, re #11064

Changeset: eee927d5761eb892fc3f7f239111b7617fb636f6

comment:31 Changed 6 years ago by Federico Montesino Pouzols

more tests on methods inputs, re #11064

Changeset: db8a64b9ce31c953320ee0ebeae1e5c5cd84dacb

comment:32 Changed 6 years ago by Federico Montesino Pouzols

more code improvements, makeHeaders helper, use Poco::Path, re #11064

Changeset: bc3cdb1e2026ac8fdfb2334b7ee8c6a6854d258d

comment:33 Changed 6 years ago by Federico Montesino Pouzols

makeHeaders() and use Poco::URI rather than raw strings, re #11064

Changeset: df9855f45d4308b79cf29f55a7c47851d22af8fc

comment:34 Changed 6 years ago by Federico Montesino Pouzols

I think this is ready again and the builds seem to be going well, so I've removed the "in progress" label.

comment:35 Changed 6 years ago by Roman Tolchenov

  • Status changed from verifying to closed

Merge pull request #503 from mantidproject/11064_add_LSFJobManger_and_SCARFLSFJobManager

Add classes LSFJobManager and SCARFLSFJobManager

Full changeset: 280e7d03431239b969ec126377e48b1612cf1024

comment:36 Changed 5 years ago by Nick Draper

Somehow these slipped through without a resolution. Set to Fixed.

comment:37 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11903

Note: See TracTickets for help on using tickets.