Ticket #11064 (closed: fixed)
Fit the SCARF job scheduler remote algorithm in the 'RemoteJobManager' design
Reported by: | Federico M Pouzols | Owned by: | Federico M Pouzols |
---|---|---|---|
Priority: | major | Milestone: | Release 3.4 |
Component: | Framework | Keywords: | |
Cc: | nick.draper@… | Blocked By: | #10591, #11122, #11123, #11124 |
Blocking: | #11126, #11361 | Tester: | Roman Tolchenov |
Description (last modified by Federico M Pouzols) (diff)
Ticket #10591 develops an algorithm to control jobs on SCARF remotely. This is being done for IMAT tomography but can be used in many other contexts.
A so-called-by-me 'RemoteJobManager' design was started some time ago for this very type of algorithms, and the 'AbortRemoteJob', 'Authenticate', 'DownloadRemoteFile', etc. algorithms were implemented and have been used for the Fermi cluster at ORNL.
This 'RemoteJobManager' design is explained/specified on:
- slide 19 of this document: https://github.com/mantidproject/documents/blob/master/Presentations/SOS18/Mantid%20HPC%20Challenges.pptx
- this wiki page: http://www.mantidproject.org/Remote_Job_Submission_API
The objective here is to do the SCARF remote job control following this design.
The implementation of the algorithms 'AbortRemoteJob' etc. seems to be a bit specific to Fermi at the moment, so this ticket will most likely require another ticket to move specifics to a IRemoteJobManager subclass, while ensuring that the use of Fermi at ORNL is not affected by our changes.
So this ticket will add a remote job manager that implements the IRemoteJobManager interface, something like SCARFLSFJobManager.
The unit test RemoteJobManagerFactoryTest (introduced in #11124) should also be updated to test creation of this SCARFLSFJobManager type of job manager.
Change History
comment:1 Changed 6 years ago by Federico M Pouzols
- Status changed from new to assigned
- Blocked By 10591 added
comment:11 Changed 6 years ago by Federico Montesino Pouzols
- Status changed from assigned to inprogress
added new LSF and SCARFLSF job managers, re #11064
Changeset: 85a97c6b49a26fd8eb6a69804c93b34398ce3f14
comment:12 Changed 6 years ago by Federico Montesino Pouzols
add new RemoteJobManagers subdir, re #11064
Changeset: d49d7f17cd40c497eaea9471a53771c01c68b25e
comment:13 Changed 6 years ago by Federico Montesino Pouzols
use the dynamic factory create, more verbose msgs, re #11064
Changeset: 2e062376ab04f7ac5d4b99ba69708d3b01c33fff
comment:14 Changed 6 years ago by Federico Montesino Pouzols
extent tests, and move out bits specific to SCARFLSF, re #11064
Changeset: eb3c515afa417938daffb11f2ff87f231955c861
comment:15 Changed 6 years ago by Federico Montesino Pouzols
update clean-up of LSFJobManager, new minimal test, re #11064
Changeset: 76084245c5717fcb92f11c3dae9f5c4c3b250199
comment:16 Changed 6 years ago by Federico Montesino Pouzols
clean-up SCARFLSF, add ping() and logout(), add tests, re #11064
Changeset: e531821c414b7c412a3a81eed0ff1516417d35ad
comment:17 Changed 6 years ago by Federico Montesino Pouzols
- Status changed from inprogress to verify
- Resolution set to fixed
This is being verified as pull request #503.
comment:18 Changed 6 years ago by Federico Montesino Pouzols
Merge remote-tracking branch 'origin/master' into 11064_add_LSFJobManger_and_SCARFLSFJobManager
Conflicts:
Code/Mantid/Framework/CMakeLists.txt
Sort out conflict with removal of MDEvents, re #11064
Changeset: 64d55c05546ff767eedc0b0bce9168c68ca5fcf2
comment:19 Changed 6 years ago by Federico Montesino Pouzols
Jenkins, be nice and retest this please
comment:20 Changed 6 years ago by Federico Montesino Pouzols
add namespace for rhel7, re #11064
Changeset: e07d8f8796ff824127e0b19cb1bae9bba25c1967
comment:21 Changed 6 years ago by Federico Montesino Pouzols
fill in the requested name when empty, cppcheck caught it, re #11064
Changeset: e88bc0d01e753b2046e53ac3f45f078f86a1debe
comment:22 Changed 6 years ago by Federico Montesino Pouzols
be more verbose and use exception object.what(), re #11064
Changeset: ec68a4c17ff706ef7de147cb1762eba2b894d7f5
comment:23 Changed 6 years ago by Federico Montesino Pouzols
It seems that this is more or less fine, but we got a PoldiCreatePeaksFromFileTest failure on osx and a checkout issue on rhel7. Jenkins, retest this please
comment:24 Changed 6 years ago by Federico Montesino Pouzols
Jenkins, you can do it, retest this please
comment:25 Changed 6 years ago by Roman Tolchenov
- Status changed from verify to verifying
- Tester set to Roman Tolchenov
comment:26 Changed 6 years ago by Federico Montesino Pouzols
many code improvements suggested by Roman in the PR re #11064, re
Changeset: e3719361e8dac45be0a90290a1b1c2c6195fe4f1
comment:27 Changed 6 years ago by Federico Montesino Pouzols
avoid UNUSED_ARG, and prefer TSM_ over TS_ASSERT, re #11064
Changeset: 8626f6f885699678004cfb51346728584ce71917
comment:28 Changed 6 years ago by Federico Montesino Pouzols
adjust new tests with proper exception, re #11064
Changeset: 4e1e937ff7c2de8109d84bf250841a4176bcafa7
comment:29 Changed 6 years ago by Federico Montesino Pouzols
I've pushed changes for many of your comments and some other improvements to the unit tests that Owen suggested in another PR. A few changes, including the Poco URI and Path points, are still missing. I'll let you know and remove the "in progress" label when this is finished.
comment:30 Changed 6 years ago by Federico Montesino Pouzols
avoid unnecessary string/c_str casts, re #11064
Changeset: eee927d5761eb892fc3f7f239111b7617fb636f6
comment:31 Changed 6 years ago by Federico Montesino Pouzols
more tests on methods inputs, re #11064
Changeset: db8a64b9ce31c953320ee0ebeae1e5c5cd84dacb
comment:32 Changed 6 years ago by Federico Montesino Pouzols
more code improvements, makeHeaders helper, use Poco::Path, re #11064
Changeset: bc3cdb1e2026ac8fdfb2334b7ee8c6a6854d258d
comment:33 Changed 6 years ago by Federico Montesino Pouzols
makeHeaders() and use Poco::URI rather than raw strings, re #11064
Changeset: df9855f45d4308b79cf29f55a7c47851d22af8fc
comment:34 Changed 6 years ago by Federico Montesino Pouzols
I think this is ready again and the builds seem to be going well, so I've removed the "in progress" label.
comment:35 Changed 6 years ago by Roman Tolchenov
- Status changed from verifying to closed
Merge pull request #503 from mantidproject/11064_add_LSFJobManger_and_SCARFLSFJobManager
Add classes LSFJobManager and SCARFLSFJobManager
Full changeset: 280e7d03431239b969ec126377e48b1612cf1024
comment:36 Changed 5 years ago by Nick Draper
Somehow these slipped through without a resolution. Set to Fixed.
comment:37 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 11903