Ticket #10591 (closed: fixed)

Opened 6 years ago

Last modified 5 years ago

Algorithm to control jobs on the SCARF cluster (and NXtomo reconstruction as a particular case)

Reported by: John Hill Owned by: Federico M Pouzols
Priority: major Milestone: Release 3.4
Component: Framework Keywords:
Cc: Blocked By:
Blocking: #10564, #11064, #11122 Tester: Raquel Alvarez

Description (last modified by Federico M Pouzols) (diff)

Create a remote algorithm for use at ISIS to control (submit, monitor, cancel, etc.) tomographic reconstruction jobs on the SCARF cluster (http://www.scarf.rl.ac.uk).

The SCARF cluster uses the LSF scheduling system. It is possible to interact with the job scheduler via ssh login, the interactive portal (https://portal.scarf.rl.ac.uk/), and a web service. The algorithm to implement here should provide several actions, using the web service as underlying mechanism:

  • log in/out
  • submit a job
  • query the jobs in the queue and their status
  • cancel a job

At present this algorithm could submit 'savu' or 'imat_recon_FBP' jobs. This runs the 'savu' tool produced at Diamond LS, and tomopy, respectively. The inputs required include (with variations...):

  • directory with input FITS image files
  • an NXtomo input file
  • a nexus file with the list (specification not available as yet)
  • output directory

We also assume that we have somewhere to put the files on SCARF (currently it's /work/imat/ for the imat project).

Change History

comment:1 Changed 6 years ago by John Hill

Refs #10591 creating base files for algorithm

Changeset: 8dff93c102cb6f25602831332fd16e511073bea6

comment:2 Changed 6 years ago by Nick Draper

  • Status changed from new to assigned

comment:3 Changed 6 years ago by John Hill

  • Status changed from assigned to inprogress
  • Milestone changed from Release 3.3 to Release 3.4

comment:4 Changed 6 years ago by John Hill

  • Owner changed from John Hill to Federico M Pouzols

comment:5 Changed 6 years ago by Federico Montesino Pouzols

clang-format, copyright line, etc, re #10591

Changeset: 08daa727c70a035461b26e946fc7b8ddbfead96e

comment:6 Changed 6 years ago by Federico Montesino Pouzols

minor header details, re #10591

Changeset: bd73dc996e850dbe60d48d52c0ee45c616c16b64

comment:7 Changed 6 years ago by Federico Montesino Pouzols

get it to compile, fill in algorithm a bit more, re #10591

Changeset: d0ff924051c3bfcd982aa3d7c9fa2bc569468e89

comment:8 Changed 6 years ago by Federico Montesino Pouzols

added algorithm doc/doctest (no test), re #10591

Changeset: 2aa897ac74c889afd126145613d9cce72c6272bb

comment:9 Changed 6 years ago by Federico M Pouzols

  • Component changed from Diffraction to Framework
  • Description modified (diff)
  • Summary changed from Algorithm to initiate an NXTomo reconstruction on SCARF to Algorithm to control NXTomo reconstruction jobs on the SCARF cluster

comment:10 Changed 6 years ago by Federico M Pouzols

  • Description modified (diff)

comment:11 Changed 6 years ago by Federico Montesino Pouzols

Login working with federal id on SCARF, re #10591

Changeset: 37b77afe00a1a2cb60c4a29bda0e1c765a0f4172

comment:12 Changed 6 years ago by Federico Montesino Pouzols

Logout, still issues with the cookie, re #10591

Changeset: f734f4186939678b2a3d653347f0082e71de7e70

comment:13 Changed 6 years ago by Federico Montesino Pouzols

log in/out and submit working fine, re #10591

Changeset: c5de82aa5bd2e5cb94657564081823fc20fa11e0

comment:14 Changed 6 years ago by Federico Montesino Pouzols

Added full doc, with dummy/exception doc-test, re #10591

Changeset: a1cb6b0a73e7bd4b696d111b25d262efc8817485

comment:15 Changed 6 years ago by Federico Montesino Pouzols

add support for other Content-Types in InternetHelper, re #10591

Changeset: e678ce2553ca48e1306bb8371751bfd3b4958125

comment:16 Changed 6 years ago by Federico Montesino Pouzols

added SCARF 'computeResource', re #10591

Changeset: e0aa88aa3575fa3714a882ee758cdc178f5fa46a

comment:17 Changed 6 years ago by Federico Montesino Pouzols

added query all-jobs status, re #10591

Changeset: 397825fb637bc0e89455f994abd563afc662474c

comment:18 Changed 6 years ago by Federico Montesino Pouzols

added query status by ID, re #10591

Changeset: 9b5e41be42107e560eab11040eebe09b531c3f60

comment:19 Changed 6 years ago by Federico Montesino Pouzols

added cancel job and ping actions, re #10591

Changeset: f9e4776e10cbfff43856ba3b34a1e9bc21a40d75

comment:20 Changed 6 years ago by Federico Montesino Pouzols

added download and download all job files, re #10591

Changeset: 389b5eb6a00be51df791dcebc0fd0196bef81a93

comment:21 Changed 6 years ago by Federico Montesino Pouzols

log (debug) request string, skipping passwords, re #10591

Changeset: a443052744e4d6bec9f05ffc2c5d8cb2c91aa9b2

comment:22 Changed 6 years ago by Federico Montesino Pouzols

deal with PAC names so multi-download works safely, re #10591

Changeset: 3a084ab6ebd77a6e94baba39d9df496308bd130e

comment:23 Changed 6 years ago by Federico Montesino Pouzols

props for the exec/runnable and its command line options, re #10591

Changeset: f086705d9edb840bffd38c9e2e6023be146c5acc

comment:24 Changed 6 years ago by Federico Montesino Pouzols

upload files working, remove excessive debug logs, re #10591

Changeset: 39fd8472d9a316dcfc2c35328c95c3d830dda706

comment:25 Changed 6 years ago by Federico Montesino Pouzols

handle err messages from PAC and give informative log msgs, re #10591

Changeset: d41a32dbf4a7f94386c0cb926cf09b48093d7284

comment:26 Changed 6 years ago by Federico M Pouzols

  • Blocking 11064 added

comment:27 Changed 6 years ago by Federico M Pouzols

  • Summary changed from Algorithm to control NXTomo reconstruction jobs on the SCARF cluster to Algorithm to control jobs on the SCARF cluster (and NXtomo reconstruction as a particular case)

comment:28 Changed 6 years ago by Federico Montesino Pouzols

add some properties, and validators, re #10591

Changeset: 9b801a415275d4738195579845f3cd5e7d1cd4ee

comment:29 Changed 6 years ago by Federico Montesino Pouzols

added better err messages and property info strings , re #10591

Changeset: 1269511fa2e70905e33b5539a7739b22b0cc89e5

comment:30 Changed 6 years ago by Federico Montesino Pouzols

doc clarifications and update, re #10591

Changeset: b33eb756609fb4b99cf61d56fbedc117fbe51ecd

comment:31 Changed 6 years ago by Federico Montesino Pouzols

added query job status to doc, re #10591

Changeset: dbfd6dde28c81d938ce0614e69ea1b6b73bfa5d8

comment:32 Changed 6 years ago by Federico Montesino Pouzols

job status is now stored in table workspaces, re #10591

Changeset: 450b5d4ee672bfdf9a089ed7c7a1503a8b57110c

comment:33 Changed 6 years ago by Federico Montesino Pouzols

catch-throw with informative message inet helper excepts, re #10591

Changeset: a8b6d0fdab96068709ea8772ac62dd28b54b49ea

comment:34 Changed 6 years ago by Federico Montesino Pouzols

First basic cmakelists for tests in remote algs, re #10591

Changeset: cfb93d432c74d64017c71aa23421a08157d142a7

comment:35 Changed 6 years ago by Federico Montesino Pouzols

added cmakelists for tests in remote algs, re #10591

Changeset: 6c5b3e0fc627544f0ffcb781b15085cee1dd9b99

comment:36 Changed 6 years ago by Federico Montesino Pouzols

fixed doxygen @param name, moved getAction helper, re #10591

Changeset: 895ee602d8a73cdd0a96d33f9fb2f211e3cc3fb3

comment:37 Changed 6 years ago by Federico Montesino Pouzols

first and very crude version of RemoteAlgorithms/SCARF test, re #10591

Changeset: 701ec416b5f7edbd04aa8778ab0997dc405428ed

comment:38 Changed 6 years ago by Federico Montesino Pouzols

forget token after logout, re #10591

Changeset: bd773cd8b00c4bda7d2f6abce29c9090f50ce30b

comment:39 Changed 6 years ago by Federico Montesino Pouzols

new and better adjusted tests, not yet consistent, re #10591

Changeset: fa4018e25edf2d53d48650dee8cc053ddc929d3a

comment:40 Changed 6 years ago by Federico Montesino Pouzols

all inet sends inside doSendRequestGetResponse (for tests), re #10591

Changeset: fe881632d828ac215ee689c22f75120ebf01bcf2

comment:41 Changed 6 years ago by Federico Montesino Pouzols

extend tests, new mock-ups, all pass, re #10591

Changeset: 245d7c57c67ee76b260258027323c968d0710f9f

comment:42 Changed 6 years ago by Federico Montesino Pouzols

fix typo in action name, re #10591

Changeset: c3ff61568ad9602919eea8ae9534209f66f36e1c

comment:43 Changed 6 years ago by Federico Montesino Pouzols

fix expected output string, re #10591

Changeset: 400d51e253af22ee7a7b9bd61d5fc9551fb8d575

comment:44 Changed 6 years ago by Federico Montesino Pouzols

Merge branch 'master' into feature/10591_scarf_reconstruction

Conflicts:

Code/Mantid/Framework/Kernel/inc/MantidKernel/InternetHelper.h Code/Mantid/Framework/Kernel/src/InternetHelper.cpp

Smallish conflicts after changes in inet helper, re #10591

Changeset: b83a9517a6f707e541b72528999c1726b76c3071

comment:45 Changed 6 years ago by Federico Montesino Pouzols

fix issues introduced in merge, re #10591

Changeset: 8c40fac3cec16548d37a5a412c9aee5462d71703

comment:46 Changed 6 years ago by Federico Montesino Pouzols

skip unneeded test data dependency, re #10591

Changeset: 9ab9fe586eaa6827c7c3f2a8daa3a9057b7c20ac

comment:47 Changed 6 years ago by Federico Montesino Pouzols

additional string var not really needed, re #10591

Changeset: 948188dc50455396d2356289664ea42d7278239b

comment:48 Changed 6 years ago by Federico Montesino Pouzols

even if InetHelper =POST, set GET if needed, polishing logs, re #10591

Changeset: 6ae4718f9f12498e6c64edeec8b403fd35764360

comment:49 Changed 6 years ago by Federico Montesino Pouzols

return job status and info as out props, re #10591

Changeset: 5dac5d7d2e1bbec9d5e3a58465c4a404ae69239b

comment:50 Changed 6 years ago by Federico M Pouzols

  • Blocking 10564 added

comment:51 Changed 6 years ago by Federico Montesino Pouzols

update to new setContentType, re #10591

Changeset: 3094e98965b8adcb0f149e3ffd97f3f4d56b25c7

comment:52 Changed 6 years ago by Federico Montesino Pouzols

doc updates and doxygen fixes, re #10591

Changeset: b084108a83f9d8b1e10eea8fa86c9a54d3ad7642

comment:53 Changed 6 years ago by Federico M Pouzols

Note that this implementation is now more general than initially described. File upload/download is supported, and savu, etc. jobs are only particular cases. Testing of real jobs running on SCARF is limited at this point but we already provide support for all we can from our side.

I'd say I'm happy with the functionality but not with names, design, etc. More work will follow when a few issues are clarified. For now this implementation should be enough to move forward with the Tomography GUI and other components needed for IMAT. More details:

  • This algorithm is fully functional but its design is not in its final form. We'd like to use the JobManager design (#11064) but we have to see how to do that without breaking other algorithms. For now this implementation should be enough to move forward with the Tomography GUI and other components needed for IMAT.
  • Also, this algorithm would be split into a few (>=9) ones and it may disappear or stay just an interface that would use several child algorithms.
  • - I don't even think it should be called SCARFTomoReconstruction, as it is more general. Something like SCARFJobControl or LSFSCARFJobControl would be better. Much but not all of the functionality is in principle generic for clusters with an LSF job scheduler. It will be renamed to be used with the JobManager / IJobManager design.
  • These issues should be dealt with in ticket #11064 and the ones that block it.

comment:54 Changed 6 years ago by Federico M Pouzols

  • Status changed from inprogress to verify
  • Resolution set to fixed

This is being verified as pull request #266.

comment:55 Changed 6 years ago by Federico Montesino Pouzols

strict about standards so it builds on all ci platforms, re #10591

Changeset: 51e10384786ca734fb66f312c3690951551a2278

comment:56 Changed 6 years ago by Federico Montesino Pouzols

and the DLL export macro had been missing all this time, re #10591

Changeset: 6c7d435d3e58e18aff3d99ebfd71fd0aae39bb06

comment:57 Changed 6 years ago by Federico Montesino Pouzols

better sorting for headers, indentation in CMakeLists, re #10591

Changeset: e1ce967f18de95afbdadaf4a3b370cef8ce60da8

comment:58 Changed 6 years ago by Federico M Pouzols

Dear and tireless Jenkins could you retest this please?

comment:59 Changed 6 years ago by Federico M Pouzols

Jenkins could you retest this please?

comment:60 Changed 6 years ago by Federico M Pouzols

Jenkins, retest this please.

comment:61 Changed 6 years ago by Federico M Pouzols

Jenkins, retest this please.

comment:62 Changed 6 years ago by Federico Montesino Pouzols

forget about precompile headers for now, so win7 compiles, re #10591

Changeset: 46e8b383915fd5e8162aa77362b0af606edf1f42

comment:63 Changed 6 years ago by Federico M Pouzols

Jenkins, retest this please.

comment:64 Changed 6 years ago by Federico M Pouzols

Jenkins, could you retest this please?

comment:65 Changed 6 years ago by Federico M Pouzols

Jenkins, retest this please?

comment:66 Changed 6 years ago by Federico M Pouzols

Good morning Jenkins, retest this please

comment:67 Changed 6 years ago by Federico M Pouzols

Jenkins retest this please

comment:68 Changed 6 years ago by Federico M Pouzols

Jenkins, retest this please...

comment:69 Changed 6 years ago by Federico M Pouzols

Jenkins, be goode and retest this please.

comment:70 Changed 6 years ago by Federico M Pouzols

  • Blocking 11122 added

(In #11122) #10591 added a test for a new remote algorithm. To prevent conflicts it would be best to have that one in master before working on this one.

comment:71 Changed 6 years ago by Raquel Alvarez

  • Status changed from verify to verifying
  • Tester set to Raquel Alvarez

comment:72 Changed 6 years ago by Federico Montesino Pouzols

remove unused visibility sets of non-ws output props, re #10591

Changeset: d34ad21cfa2616c543b3626045eb1d033f8d59bf

comment:73 Changed 6 years ago by Federico Montesino Pouzols

fixed typo in property description, re #10591

Changeset: d4b05baad4367c141038e0a297a47bd159a8b25e

comment:74 Changed 6 years ago by Raquel Alvarez

  • Status changed from verifying to closed

Merge pull request #266 from mantidproject/feature/10591_scarf_reconstruction

Algorithm to control remote jobs on the SCARF cluster (for tomography reconstruction - IMAT)

Full changeset: ff394aa38bc11d9b9ebb49432fbd780288389eb0

comment:75 Changed 6 years ago by Andrei Savici

Unused variable shows up in cppcheck

comment:76 Changed 6 years ago by Federico Montesino Pouzols

remove unused var (cppcheck) and check return val (cvrity), re #10591

Changeset: 3f092385001c75b81bd1ee8a1deba82716ce88e2

comment:77 Changed 5 years ago by Nick Draper

Somehow these slipped through without a resolution. Set to Fixed.

comment:78 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 11433

Note: See TracTickets for help on using tickets.