Ticket #9208 (closed: fixed)

Opened 7 years ago

Last modified 5 years ago

Have GUI Timeout Instead of Hang When ICAT is Down

Reported by: Peter Parker Owned by: Jay Rainey
Priority: critical Milestone: Release 3.2
Component: Framework Keywords: ICAT
Cc: Blocked By:
Blocking: #8538, #9186, #9223 Tester: Martyn Gigg

Description

Given that ICAT is a remote service which we can't necessarily assume will be available at any given time, it would be good to wrap any calls to the Catalog* algorithms from the GUI in their own separate threads, which would then timeout after a certain period of time. We could specify a sensible default timeout value in Mantid's property file, and maybe even have a timeout property on the algorithms themselves.

Currently, if ICAT is not available (due to network problems, service outages or otherwise) then Mantid will hang, and users are forced to lose any unsaved changes since they have to kill the Mantid process and restart to keep on working.

Change History

comment:1 Changed 7 years ago by Martyn Gigg

  • Status changed from new to assigned

comment:2 Changed 7 years ago by Jay Rainey

Having discussed this further with Martyn, we agreed on the following solution:

  1. Move catalog methods from MantidUI (e.g. isValidCatalogLogin & catalogPublishDialog) to CatalogHelper.
  2. Make use of CatalogHelper for the calls in ApplicationWindow of the moved methods.
  3. Update executeAsynchronously in CatalogHelper to have a timeOut parameter, and make use of it in the implementation.
  4. Update executeAsynchronouslys signature to return a string. This will be an error/success message for the user, e.g. Execution of algorithm N has timed out, ICAT appears to be down.
  5. Set a timeout property in Mantid.properties.template (named Catalog.timeoutValue) that:
    • Has a default value of 30.
    • If set to negative value then timeout is set to 0 (e.g. turned off)
    • A comment explaining why the timeout value is needed.

comment:3 Changed 7 years ago by Jay Rainey

  • Status changed from assigned to inprogress

Added implementation to new CatalogHelper methods. Refs #9208.

Changeset: 52e9382434ee181e33dc49c72e01ade3d9ea60d3

comment:4 Changed 7 years ago by Jay Rainey

Call catalogHelper methods and not MantidUI. Refs #9208.

Changeset: 5adda878e06a8549e1ad703bf24426fb5457a95a

comment:5 Changed 7 years ago by Jay Rainey

Remove unused methods. Refs #9208.

Changeset: 4cef12712a255f5a57f3ad07b74552ea217c9b4e

comment:6 Changed 7 years ago by Jay Rainey

Add catalog timeout value to mantid properties. Refs #9208.

Changeset: 1346fc28ddad8927b1d3fd0677ff1ef0b57df0af

comment:7 Changed 7 years ago by Jay Rainey

Add timout property to CatalogHelper. Refs #9208.

Changeset: 2dd4d5c1234e86fcc68405e8694a4fdd571ec387

comment:8 Changed 7 years ago by Jay Rainey

Display ICAT error on GUI. Refs #9208.

Changeset: ce2437c6f3b1d857cdd4a8115755290310a3af11

comment:9 Changed 7 years ago by Jay Rainey

Refactor CatalogPublish. Refs #9208.

Changeset: a8c82d7e06fbe32e0700d05366c55323210c0704

comment:10 Changed 7 years ago by Jay Rainey

Merge remote-tracking branch 'origin/master' into feature/9208_icat_gui_hang_fix. Refs #9208.

Conflicts:

  • Code/Mantid/MantidQt/CustomDialogs/inc/MantidQtCustomDialogs/CatalogPublishDialog.h
  • Code/Mantid/MantidQt/CustomDialogs/src/CatalogPublishDialog.cpp

Changeset: bac2d4cb25a72b6030a337df5b96bb80626af0a0

comment:11 Changed 7 years ago by Jay Rainey

Add check to prevent crash. Refs #9208.

Changeset: 6de07c2e5b5e0d50f4b87d043fed653a92e7c163

comment:12 Changed 7 years ago by Jay Rainey

Merge branch 'feature/9208_icat_gui_hang_fix' into develop. Refs #9208.

Conflicts:

  • Code/Mantid/Framework/DataHandling/src/SaveANSTOAscii.cpp
  • Code/Mantid/Framework/PythonInterface/plugins/algorithms/CalibrateRectangularDetectors.py

Changeset: 69d8fa8588e374393f898388c39aa34dbdb13c09

comment:13 Changed 7 years ago by Jay Rainey

Add windows DLL include. Refs #9208.

Changeset: 9cd147baa4b2490a9ea7976d7b34947171621302

comment:14 Changed 7 years ago by Jay Rainey

  • Status changed from inprogress to verify
  • Resolution set to fixed

Add back multiple facility support. Refs #9208.

Changeset: 3ff7c488f87ae2fb08699f17660613ebd96de31f

Last edited 7 years ago by Jay Rainey (previous) (diff)

comment:15 Changed 7 years ago by Jay Rainey

To test

You need to verify that all the catalog algorithms are now in separate threads, and no longer hang Mantid when they are performed without a network connection. Below are some examples that should be tested before (to verify the hang occurred) and after (to verify I have fixed it):

Verify reachable end-point

  1. Turn off your Internet connection.
  2. Attempt to log into the catalog (Catalog -> Login)
  3. An error will be thrown: ICAT appears to be offline. Please check your connection or report this issue.

Execute algorithm without Internet connection

  1. Log into the catalog.
  2. Disable your Internet connection.
  3. Click Publish (it executes myData). The error message will be displayed on the GUI.
  4. Click Search. The same error message will be logged to Results log.

Verify unreachable end-point

The SNS end-point is behind a firewall, so is not accessible at ISIS. Previously, if you attempted to log into SNS then Mantid would die. You need to verify that this is no longer the case by:

  1. Attempt to log into SNS (Catalog -> Login and select SNS as facility)
  2. Mantid will now execute the algorithm asynchronously, and after a short period of time (30 seconds), the same error message as above will be thrown.
Last edited 7 years ago by Jay Rainey (previous) (diff)

comment:16 Changed 7 years ago by Jay Rainey

  • Blocking 9186, 9223 added

comment:17 Changed 7 years ago by Jay Rainey

  • Blocking 8538 added

(In #8538) To add DOI generation support for publishing I need to:

  1. Generate gSoap files against the DOI generator end-point, which are:
    • https://data4.isis.stfc.ac.uk/doi/DOIService?wsdl Develop
    • https://sig-03.esc.rl.ac.uk:8181/doi/DOIService?wsdl Production
  2. Add the end-point (dev for now until changes below are made) to Facilities.xml.
  3. Update ICatalogInfoService to include a new method (generateDOIForDatafile) that makes use of the gSoap methods above (specifically: registerDatafileDOI) and returns the generated DOI (to output on LOG).
  4. Add and implement the functionality of the new method above in ICat4Catalog.
    • This includes obtaining the end-point from Facilities.xml t oaccomdate other facilities using th
  5. Add two new properties to CatalogPublish:
    • InvestigationID, "",`The database row ID of the investigation that you want to publish to. (Note: this is required to generate a DOI.)
    • GenerateDOI, true, "Generates a DOI for the datafile being published. Once a DOI is generated the datafile will be made public."
  6. Add logic to exec in CatalogPublish to make use of the properties and call generateDOIForDatafile from the catalog. Note: if a file fails to upload (e.g. already exists etc...) a DOI must not be generated.
  7. Update CatalogPublishDialog to have a checkbox for GenerateDOI. InvestigationID will be hidden on the GUI and obtained from the table generated getPublishInvestigations (once #9223 is complete).
    • The generateDOI (make public) checkbox on the GUI should be checked by default.
    • To prevent users accidentally generating DOIs that they may not have wanted to, a QMessageBox should be generated IF the user has checked generateDOI to confirm this choice.

comment:18 Changed 7 years ago by Martyn Gigg

  • Status changed from verify to verifying
  • Tester set to Martyn Gigg

comment:19 Changed 7 years ago by Martyn Gigg

  • Status changed from verifying to reopened
  • Resolution fixed deleted

So everything seems okay as long as the connection goes down when your not in the middle of a catalog action, e.g search/publish. If you perform a search that takes a while then disconnect , you get a the algorithm stuck without being able to be cancelled.

The GUI remains responsive but the search never stops.

comment:20 Changed 7 years ago by Jay Rainey

  • Status changed from reopened to inprogress

Remove use of atoi. Refs #9208.

Changeset: 6fb811d646ccb76c8d0de161a58aee1164947f1f

comment:21 Changed 7 years ago by Jay Rainey

Add timeout to catalog (gSoap) method calls. Refs #9208.

Changeset: 94a2a7aa26cbb799888194a2186e4d59d7f634d1

comment:22 Changed 7 years ago by Jay Rainey

Merge branch 'feature/9208_icat_gui_hang_fix' into develop

Changeset: ca3714aadab53f83165037c62c905943c6096283

Last edited 7 years ago by Jay Rainey (previous) (diff)

comment:23 Changed 7 years ago by Jay Rainey

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:24 Changed 7 years ago by Jay Rainey

I've added a timeout call to the gSoap client which will address the problem noted by Martyn in comment:19.

To test

  1. Perform a long search (e.g. set dates to: 01/01/1990 and 01/01/2014)
  2. Disconnect from the Internet.
  3. Wait 30 seconds and you will get the same error message as produced in comment:15.

comment:25 Changed 7 years ago by Martyn Gigg

  • Status changed from verify to verifying

comment:26 Changed 7 years ago by Martyn Gigg

  • Status changed from verifying to reopened
  • Resolution fixed deleted

That seems much better now and times out as I would expect.

One minor point now is that I don't think we need to maxTime stuff in CatalogHelper::executeAsynchronously as it's all handled under the hood so that can be left just doing the processEvents stuff.

comment:27 Changed 7 years ago by Jay Rainey

  • Status changed from reopened to inprogress

Remove unnecessary variable. Refs #9208.

Changeset: 73da39d33f2d1ace753e5efef74500125ea58de8

comment:28 Changed 7 years ago by Jay Rainey

Merge branch 'feature/9208_icat_gui_hang_fix' into develop

  • feature/9208_icat_gui_hang_fix: Remove unnecessary variable. Refs #9208.

Changeset: 178945cc402d5cb1a047e67e3bd106c3377ce835

comment:29 Changed 7 years ago by Jay Rainey

Verify that the minor change made in this commit addresses the suggestion made by Martyn in comment:26.

comment:30 Changed 7 years ago by Jay Rainey

  • Status changed from inprogress to verify
  • Resolution set to fixed

comment:31 Changed 7 years ago by Martyn Gigg

  • Status changed from verify to verifying

comment:32 follow-up: ↓ 34 Changed 7 years ago by Martyn Gigg

  • Status changed from verifying to reopened
  • Resolution fixed deleted

The commit in comment 28 doesn't quite address what I was saying in comment 26. As the timeout is now handled further down then I don't think we need the timer at all in executeAsynchronously so I think that function can just return to being.

Poco::ActiveResult<bool> result(algorithm->executeAsync());
while (!result.available())
{
  QCoreApplication::processEvents();
}

comment:33 Changed 7 years ago by Jay Rainey

  • Status changed from reopened to inprogress

Remove maxTime as it's now handled in catalog. Refs #9208.

Changeset: a6ca05d5ee1dff17e17a1589b5e64fd02237309d

comment:34 in reply to: ↑ 32 Changed 7 years ago by Jay Rainey

  • Status changed from inprogress to verify
  • Resolution set to fixed

Verify that maxTime has been removed from CatalogHelper as mentioned by Martyn in comment:32. This was removed as gSoap takes care of this for us.

comment:35 Changed 7 years ago by Martyn Gigg

  • Status changed from verify to verifying

comment:36 Changed 7 years ago by Jay Rainey

Remove unnecessary timer. Refs #9208.

Changeset: 0528095e0f06179ae3b6e09c1bcf5655b06cc023

comment:37 Changed 7 years ago by Martyn Gigg

  • Status changed from verifying to closed

Merge remote-tracking branch 'origin/feature/9208_icat_gui_hang_fix'

Full changeset: 602d6bc241792007abba9c34e0490959c5b58bfd

comment:38 Changed 5 years ago by Stuart Campbell

This ticket has been transferred to github issue 10051

Note: See TracTickets for help on using tickets.