Ticket #10591 (closed: fixed)
Algorithm to control jobs on the SCARF cluster (and NXtomo reconstruction as a particular case)
Reported by: | John Hill | Owned by: | Federico M Pouzols |
---|---|---|---|
Priority: | major | Milestone: | Release 3.4 |
Component: | Framework | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | #10564, #11064, #11122 | Tester: | Raquel Alvarez |
Description (last modified by Federico M Pouzols) (diff)
Create a remote algorithm for use at ISIS to control (submit, monitor, cancel, etc.) tomographic reconstruction jobs on the SCARF cluster (http://www.scarf.rl.ac.uk).
The SCARF cluster uses the LSF scheduling system. It is possible to interact with the job scheduler via ssh login, the interactive portal (https://portal.scarf.rl.ac.uk/), and a web service. The algorithm to implement here should provide several actions, using the web service as underlying mechanism:
- log in/out
- submit a job
- query the jobs in the queue and their status
- cancel a job
At present this algorithm could submit 'savu' or 'imat_recon_FBP' jobs. This runs the 'savu' tool produced at Diamond LS, and tomopy, respectively. The inputs required include (with variations...):
- directory with input FITS image files
- an NXtomo input file
- a nexus file with the list (specification not available as yet)
- output directory
We also assume that we have somewhere to put the files on SCARF (currently it's /work/imat/ for the imat project).
Change History
comment:3 Changed 6 years ago by John Hill
- Status changed from assigned to inprogress
- Milestone changed from Release 3.3 to Release 3.4
comment:5 Changed 6 years ago by Federico Montesino Pouzols
clang-format, copyright line, etc, re #10591
Changeset: 08daa727c70a035461b26e946fc7b8ddbfead96e
comment:6 Changed 6 years ago by Federico Montesino Pouzols
minor header details, re #10591
Changeset: bd73dc996e850dbe60d48d52c0ee45c616c16b64
comment:7 Changed 6 years ago by Federico Montesino Pouzols
get it to compile, fill in algorithm a bit more, re #10591
Changeset: d0ff924051c3bfcd982aa3d7c9fa2bc569468e89
comment:8 Changed 6 years ago by Federico Montesino Pouzols
added algorithm doc/doctest (no test), re #10591
Changeset: 2aa897ac74c889afd126145613d9cce72c6272bb
comment:9 Changed 6 years ago by Federico M Pouzols
- Component changed from Diffraction to Framework
- Description modified (diff)
- Summary changed from Algorithm to initiate an NXTomo reconstruction on SCARF to Algorithm to control NXTomo reconstruction jobs on the SCARF cluster
comment:11 Changed 6 years ago by Federico Montesino Pouzols
Login working with federal id on SCARF, re #10591
Changeset: 37b77afe00a1a2cb60c4a29bda0e1c765a0f4172
comment:12 Changed 6 years ago by Federico Montesino Pouzols
Logout, still issues with the cookie, re #10591
Changeset: f734f4186939678b2a3d653347f0082e71de7e70
comment:13 Changed 6 years ago by Federico Montesino Pouzols
log in/out and submit working fine, re #10591
Changeset: c5de82aa5bd2e5cb94657564081823fc20fa11e0
comment:14 Changed 6 years ago by Federico Montesino Pouzols
Added full doc, with dummy/exception doc-test, re #10591
Changeset: a1cb6b0a73e7bd4b696d111b25d262efc8817485
comment:15 Changed 6 years ago by Federico Montesino Pouzols
add support for other Content-Types in InternetHelper, re #10591
Changeset: e678ce2553ca48e1306bb8371751bfd3b4958125
comment:16 Changed 6 years ago by Federico Montesino Pouzols
added SCARF 'computeResource', re #10591
Changeset: e0aa88aa3575fa3714a882ee758cdc178f5fa46a
comment:17 Changed 6 years ago by Federico Montesino Pouzols
added query all-jobs status, re #10591
Changeset: 397825fb637bc0e89455f994abd563afc662474c
comment:18 Changed 6 years ago by Federico Montesino Pouzols
added query status by ID, re #10591
Changeset: 9b5e41be42107e560eab11040eebe09b531c3f60
comment:19 Changed 6 years ago by Federico Montesino Pouzols
added cancel job and ping actions, re #10591
Changeset: f9e4776e10cbfff43856ba3b34a1e9bc21a40d75
comment:20 Changed 6 years ago by Federico Montesino Pouzols
added download and download all job files, re #10591
Changeset: 389b5eb6a00be51df791dcebc0fd0196bef81a93
comment:21 Changed 6 years ago by Federico Montesino Pouzols
log (debug) request string, skipping passwords, re #10591
Changeset: a443052744e4d6bec9f05ffc2c5d8cb2c91aa9b2
comment:22 Changed 6 years ago by Federico Montesino Pouzols
deal with PAC names so multi-download works safely, re #10591
Changeset: 3a084ab6ebd77a6e94baba39d9df496308bd130e
comment:23 Changed 6 years ago by Federico Montesino Pouzols
props for the exec/runnable and its command line options, re #10591
Changeset: f086705d9edb840bffd38c9e2e6023be146c5acc
comment:24 Changed 6 years ago by Federico Montesino Pouzols
upload files working, remove excessive debug logs, re #10591
Changeset: 39fd8472d9a316dcfc2c35328c95c3d830dda706
comment:25 Changed 6 years ago by Federico Montesino Pouzols
handle err messages from PAC and give informative log msgs, re #10591
Changeset: d41a32dbf4a7f94386c0cb926cf09b48093d7284
comment:27 Changed 6 years ago by Federico M Pouzols
- Summary changed from Algorithm to control NXTomo reconstruction jobs on the SCARF cluster to Algorithm to control jobs on the SCARF cluster (and NXtomo reconstruction as a particular case)
comment:28 Changed 6 years ago by Federico Montesino Pouzols
add some properties, and validators, re #10591
Changeset: 9b801a415275d4738195579845f3cd5e7d1cd4ee
comment:29 Changed 6 years ago by Federico Montesino Pouzols
added better err messages and property info strings , re #10591
Changeset: 1269511fa2e70905e33b5539a7739b22b0cc89e5
comment:30 Changed 6 years ago by Federico Montesino Pouzols
doc clarifications and update, re #10591
Changeset: b33eb756609fb4b99cf61d56fbedc117fbe51ecd
comment:31 Changed 6 years ago by Federico Montesino Pouzols
added query job status to doc, re #10591
Changeset: dbfd6dde28c81d938ce0614e69ea1b6b73bfa5d8
comment:32 Changed 6 years ago by Federico Montesino Pouzols
job status is now stored in table workspaces, re #10591
Changeset: 450b5d4ee672bfdf9a089ed7c7a1503a8b57110c
comment:33 Changed 6 years ago by Federico Montesino Pouzols
catch-throw with informative message inet helper excepts, re #10591
Changeset: a8b6d0fdab96068709ea8772ac62dd28b54b49ea
comment:34 Changed 6 years ago by Federico Montesino Pouzols
First basic cmakelists for tests in remote algs, re #10591
Changeset: cfb93d432c74d64017c71aa23421a08157d142a7
comment:35 Changed 6 years ago by Federico Montesino Pouzols
added cmakelists for tests in remote algs, re #10591
Changeset: 6c5b3e0fc627544f0ffcb781b15085cee1dd9b99
comment:36 Changed 6 years ago by Federico Montesino Pouzols
fixed doxygen @param name, moved getAction helper, re #10591
Changeset: 895ee602d8a73cdd0a96d33f9fb2f211e3cc3fb3
comment:37 Changed 6 years ago by Federico Montesino Pouzols
first and very crude version of RemoteAlgorithms/SCARF test, re #10591
Changeset: 701ec416b5f7edbd04aa8778ab0997dc405428ed
comment:38 Changed 6 years ago by Federico Montesino Pouzols
forget token after logout, re #10591
Changeset: bd773cd8b00c4bda7d2f6abce29c9090f50ce30b
comment:39 Changed 6 years ago by Federico Montesino Pouzols
new and better adjusted tests, not yet consistent, re #10591
Changeset: fa4018e25edf2d53d48650dee8cc053ddc929d3a
comment:40 Changed 6 years ago by Federico Montesino Pouzols
all inet sends inside doSendRequestGetResponse (for tests), re #10591
Changeset: fe881632d828ac215ee689c22f75120ebf01bcf2
comment:41 Changed 6 years ago by Federico Montesino Pouzols
extend tests, new mock-ups, all pass, re #10591
Changeset: 245d7c57c67ee76b260258027323c968d0710f9f
comment:42 Changed 6 years ago by Federico Montesino Pouzols
fix typo in action name, re #10591
Changeset: c3ff61568ad9602919eea8ae9534209f66f36e1c
comment:43 Changed 6 years ago by Federico Montesino Pouzols
fix expected output string, re #10591
Changeset: 400d51e253af22ee7a7b9bd61d5fc9551fb8d575
comment:44 Changed 6 years ago by Federico Montesino Pouzols
Merge branch 'master' into feature/10591_scarf_reconstruction
Conflicts:
Code/Mantid/Framework/Kernel/inc/MantidKernel/InternetHelper.h Code/Mantid/Framework/Kernel/src/InternetHelper.cpp
Smallish conflicts after changes in inet helper, re #10591
Changeset: b83a9517a6f707e541b72528999c1726b76c3071
comment:45 Changed 6 years ago by Federico Montesino Pouzols
fix issues introduced in merge, re #10591
Changeset: 8c40fac3cec16548d37a5a412c9aee5462d71703
comment:46 Changed 6 years ago by Federico Montesino Pouzols
skip unneeded test data dependency, re #10591
Changeset: 9ab9fe586eaa6827c7c3f2a8daa3a9057b7c20ac
comment:47 Changed 6 years ago by Federico Montesino Pouzols
additional string var not really needed, re #10591
Changeset: 948188dc50455396d2356289664ea42d7278239b
comment:48 Changed 6 years ago by Federico Montesino Pouzols
even if InetHelper =POST, set GET if needed, polishing logs, re #10591
Changeset: 6ae4718f9f12498e6c64edeec8b403fd35764360
comment:49 Changed 6 years ago by Federico Montesino Pouzols
return job status and info as out props, re #10591
Changeset: 5dac5d7d2e1bbec9d5e3a58465c4a404ae69239b
comment:51 Changed 6 years ago by Federico Montesino Pouzols
update to new setContentType, re #10591
Changeset: 3094e98965b8adcb0f149e3ffd97f3f4d56b25c7
comment:52 Changed 6 years ago by Federico Montesino Pouzols
doc updates and doxygen fixes, re #10591
Changeset: b084108a83f9d8b1e10eea8fa86c9a54d3ad7642
comment:53 Changed 6 years ago by Federico M Pouzols
Note that this implementation is now more general than initially described. File upload/download is supported, and savu, etc. jobs are only particular cases. Testing of real jobs running on SCARF is limited at this point but we already provide support for all we can from our side.
I'd say I'm happy with the functionality but not with names, design, etc. More work will follow when a few issues are clarified. For now this implementation should be enough to move forward with the Tomography GUI and other components needed for IMAT. More details:
- This algorithm is fully functional but its design is not in its final form. We'd like to use the JobManager design (#11064) but we have to see how to do that without breaking other algorithms. For now this implementation should be enough to move forward with the Tomography GUI and other components needed for IMAT.
- Also, this algorithm would be split into a few (>=9) ones and it may disappear or stay just an interface that would use several child algorithms.
- - I don't even think it should be called SCARFTomoReconstruction, as it is more general. Something like SCARFJobControl or LSFSCARFJobControl would be better. Much but not all of the functionality is in principle generic for clusters with an LSF job scheduler. It will be renamed to be used with the JobManager / IJobManager design.
- These issues should be dealt with in ticket #11064 and the ones that block it.
comment:54 Changed 6 years ago by Federico M Pouzols
- Status changed from inprogress to verify
- Resolution set to fixed
This is being verified as pull request #266.
comment:55 Changed 6 years ago by Federico Montesino Pouzols
strict about standards so it builds on all ci platforms, re #10591
Changeset: 51e10384786ca734fb66f312c3690951551a2278
comment:56 Changed 6 years ago by Federico Montesino Pouzols
and the DLL export macro had been missing all this time, re #10591
Changeset: 6c7d435d3e58e18aff3d99ebfd71fd0aae39bb06
comment:57 Changed 6 years ago by Federico Montesino Pouzols
better sorting for headers, indentation in CMakeLists, re #10591
Changeset: e1ce967f18de95afbdadaf4a3b370cef8ce60da8
comment:58 Changed 6 years ago by Federico M Pouzols
Dear and tireless Jenkins could you retest this please?
comment:59 Changed 6 years ago by Federico M Pouzols
Jenkins could you retest this please?
comment:60 Changed 6 years ago by Federico M Pouzols
Jenkins, retest this please.
comment:61 Changed 6 years ago by Federico M Pouzols
Jenkins, retest this please.
comment:62 Changed 6 years ago by Federico Montesino Pouzols
forget about precompile headers for now, so win7 compiles, re #10591
Changeset: 46e8b383915fd5e8162aa77362b0af606edf1f42
comment:63 Changed 6 years ago by Federico M Pouzols
Jenkins, retest this please.
comment:64 Changed 6 years ago by Federico M Pouzols
Jenkins, could you retest this please?
comment:65 Changed 6 years ago by Federico M Pouzols
Jenkins, retest this please?
comment:66 Changed 6 years ago by Federico M Pouzols
Good morning Jenkins, retest this please
comment:67 Changed 6 years ago by Federico M Pouzols
Jenkins retest this please
comment:68 Changed 6 years ago by Federico M Pouzols
Jenkins, retest this please...
comment:69 Changed 6 years ago by Federico M Pouzols
Jenkins, be goode and retest this please.
comment:70 Changed 6 years ago by Federico M Pouzols
- Blocking 11122 added
comment:71 Changed 6 years ago by Raquel Alvarez
- Status changed from verify to verifying
- Tester set to Raquel Alvarez
comment:72 Changed 6 years ago by Federico Montesino Pouzols
remove unused visibility sets of non-ws output props, re #10591
Changeset: d34ad21cfa2616c543b3626045eb1d033f8d59bf
comment:73 Changed 6 years ago by Federico Montesino Pouzols
fixed typo in property description, re #10591
Changeset: d4b05baad4367c141038e0a297a47bd159a8b25e
comment:74 Changed 6 years ago by Raquel Alvarez
- Status changed from verifying to closed
Merge pull request #266 from mantidproject/feature/10591_scarf_reconstruction
Algorithm to control remote jobs on the SCARF cluster (for tomography reconstruction - IMAT)
Full changeset: ff394aa38bc11d9b9ebb49432fbd780288389eb0
comment:75 Changed 6 years ago by Andrei Savici
Unused variable shows up in cppcheck
comment:76 Changed 6 years ago by Federico Montesino Pouzols
remove unused var (cppcheck) and check return val (cvrity), re #10591
Changeset: 3f092385001c75b81bd1ee8a1deba82716ce88e2
comment:77 Changed 5 years ago by Nick Draper
Somehow these slipped through without a resolution. Set to Fixed.
comment:78 Changed 5 years ago by Stuart Campbell
This ticket has been transferred to github issue 11433
Refs #10591 creating base files for algorithm
Changeset: 8dff93c102cb6f25602831332fd16e511073bea6